ASTRAEA: A GPU-Oriented Token-wise Acceleration Framework for Video Diffusion Transformers

Haosong Liu1*, Yuge Cheng1*, Zihan Liu1, Aiyue Chen2, Yiwu Yao2
Chen Chen1, Jingwen Leng1,2, Yu Feng1,2,#, Minyi Guo1,2
1Shanghai Jiao Tong University 2Shanghai Qizhi Institute 3Huawei Technologies Co.,Ltd
*Equal Contribution #Corresponding Author

Comparison of video generation speeds between original method and ours. We test on Wan with videos of 2s 480p resolution. Baseline and ours use 1 A100 GPU.

Abstract

We introduce ASTRAEA, an automatic framework that searches for near-optimal configurations for vDiT-based video generation.At its core, ASTRAEA proposes a lightweight token selection mechanism and a memory-efficient, GPU-parallel sparse attention strategy, enabling linear reductions in execution time with minimal impact on generation quality.To determine optimal token reduction for different timesteps, we further design a search framework that leverages a classic evolutionary algorithm to automatically determine the distribution of the token budget effectively.Together, ASTRAEA achieves up to 2.4x inference speedup on a single GPU with great scalability (up to 13.2x speedup on 8 GPUs) while retaining better video quality compared to the state-of-the-art methods (<0.5% loss on the VBench score compared to the baseline vDiT models).

-->
Your Image Alt Text

Quantitative evaluation of our method against the state-of-the-arts PAB, ToCa and Delta-Dit on two vDiT models: Wan v2.1 1.3B and OpenSora v1.2.

Your Image Alt Text

The speedup of AsTRAEA against the baseline models across various numbers of GPUs.

Video Comparisons Wan

Origin

Ours 0.7

PSNR: 29.21 Speedup: 1.36x

Ours 0.5

PSNR: 22.45 Speedup: 1.86x

Ours 0.4

PSNR: 22.35 Speedup: 2.29x

PAB 26

PSNR: 18.21 Speedup: 1.37x

TOCA 0.8

PSNR: 19.77 Speedup: 1.60x

Prompt: a dog running happily

Origin

Ours 0.7

PSNR: 29.37 Speedup: 1.36x

Ours 0.5

PSNR: 24.95 Speedup: 1.86x

Ours 0.4

PSNR: 24.29 Speedup: 2.29x

PAB 26

PSNR: 18.55 Speedup: 1.37x

TOCA 0.8

PSNR: 17.26 Speedup: 1.60x

Prompt: A jellyfish floating through the ocean, with bioluminescent tentacles

Origin

Ours 0.7

PSNR: 27.58 Speedup: 1.36x

Ours 0.5

PSNR: 23.51 Speedup: 1.86x

Ours 0.4

PSNR: 22.33 Speedup: 2.29x

PAB 26

PSNR: 16.37 Speedup: 1.37x

TOCA 0.8

PSNR: 18.87 Speedup: 1.60x

Prompt: A robot DJ is playing the turntable, in heavy raining futuristic tokyo rooftop cyberpunk night, sci-fi, fantasy

Origin

Ours 0.7

PSNR: 35.00 Speedup: 1.36x

Ours 0.5

PSNR: 25.76 Speedup: 1.86x

Ours 0.4

PSNR: 25.28 Speedup: 2.29x

PAB 26

PSNR: 18.28 Speedup: 1.37x

TOCA 0.8

PSNR: 24.08 Speedup: 1.60x

Prompt: A raccoon dressed in suit playing the trumpet, stage background

Video Comparisons OpenSora

Origin

Ours 0.7

PSNR: 24.11 Speedup: 1.46x

Ours 0.5

PSNR: 23.12 Speedup: 1.91x

Ours 0.4

PSNR: 18.14 Speedup: 2.35x

PAB 246

PSNR: 18.57 Speedup: 1.23x

TOCA 0.8

PSNR: 16.18 Speedup: 1.69x

Prompt: A couple in formal evening wear going home get caught in a heavy downpour with umbrellas by Hokusai, in the style of Ukiyo

Origin

Ours 0.7

PSNR: 24.03 Speedup: 1.46x

Ours 0.5

PSNR: 22.46 Speedup: 1.91x

Ours 0.4

PSNR: 20.74 Speedup: 2.35x

PAB 246

PSNR: 19.48 Speedup: 1.23x

TOCA 0.8

PSNR: 15.69 Speedup: 1.69x

Prompt: an elephant running to join a herd of its kind

Origin

Ours 0.7

PSNR: 22.14 Speedup: 1.46x

Ours 0.5

PSNR: 20.74 Speedup: 1.91x

Ours 0.4

PSNR: 18.89 Speedup: 2.35x

PAB 246

PSNR: 18.64 Speedup: 1.23x

TOCA 0.8

PSNR: 12.37 Speedup: 1.69x

Prompt: /A happy fuzzy panda playing guitar nearby a campfire, snow mountain in the background

Origin

Ours 0.7

PSNR: 22.89 Speedup: 1.46x

Ours 0.5

PSNR: 21.67 Speedup: 1.91x

Ours 0.4

PSNR: 19.77 Speedup: 2.35x

PAB 246

PSNR: 18.24 Speedup: 1.23x

TOCA 0.8

PSNR: 17.56 Speedup: 1.69x

Prompt: botanical garden