[NeurIPS 2024] Training-Free Adaptive Diffusion with Bounded Difference Approximation Strategy

1Shanghai Artificial Intelligence Laboratory
2School of Information Science and Technology, Fudan University
3School of Artificial Intelligence, Shanghai Jiao Tong University

*Equal Contribution, Corresponding author

Abstract

Diffusion models have recently achieved great success in the synthesis of highquality images and videos. However, the existing denoising techniques in diffusion models are commonly based on step-by-step noise predictions, which suffers from high computation cost, resulting in a prohibitive latency for interactive applications. In this paper, we propose AdaptiveDiffusion to relieve this bottleneck by adaptively reducing the noise prediction steps during the denoising process. Our method considers the potential of skipping as many noise prediction steps as possible while keeping the final denoised results identical to the original full-step ones. Specifically, the skipping strategy is guided by the third-order latent difference that indicates the stability between timesteps during the denoising process, which benefits the reusing of previous noise prediction results. Extensive experiments on image and video diffusion models demonstrate that our method can significantly speed up the denoising process while generating identical results to the original process, achieving up to an average 2 ~ 5x speedup without quality degradation.

Different prompts may require different steps of noise prediction!!!

MY ALT TEXT

For Prompt 1, we only need 20 steps out of 50 steps for noise predictions to generate an almost lossless image, while for Prompt 2, we need 26 steps out of 50 steps to achieve an almost lossless image.

Denoising process of AdaptiveDiffusion

MY ALT TEXT

We design a third-order estimator, which can find the redundancy between neighboring timesteps, and thus, the noise prediction model can be skipped or inferred according to the indicate from the estimator, achieving the adaptive diffusion process. Note that the timestep and text information embeddings are not shown for the sake of brevity.

AdaptiveDiffusion can generate high-quality images or videos with less cost!!!

MY ALT TEXT

Quantitative results on MS-COCO 2017.

MY ALT TEXT

Quantitative results on video generation tasks.

MY ALT TEXT

Qualitative results of text-to-image generation task using LDM-4 on ImageNet 256x256 benchmark.

Qualitative results of text-to-video generation task using I2VGen-XL.

Qualitative results of text-to-video generation task using ModelScopeT2V.