Change8

v0.35.0

📦 diffusersView on GitHub →
12 features🐛 6 fixes2 deprecations🔧 8 symbols

Summary

This release introduces major new pipelines (Wan 2.2, Flux-Kontext, Qwen-Image), significant performance optimizations via regional compilation and GGUF CUDA kernels, and an experimental modular pipeline system.

Migration Steps

  1. To speed up loading, replace .to("cuda") with device_map="cuda" in DiffusionPipeline.from_pretrained().
  2. Enable parallel loading by setting os.environ["HF_ENABLE_PARALLEL_LOADING"] = "yes" before loading large models.
  3. For performance optimization, implement regional compilation by following the updated optimization guides.

✨ New Features

  • Added Wan 2.2 video generation pipeline with improved fidelity and prompt adherence.
  • Added Flux-Kontext image editing pipeline (12B parameter rectified flow transformer).
  • Added Qwen-Image and Qwen-Image-Edit pipelines (Apache-2.0 licensed).
  • Introduced Regional Compilation to reduce cold-start latency and compile time by 8–10x.
  • Added support for loading pipelines directly to accelerator devices using device_map.
  • Enabled parallelized loading of state dict shards via HF_ENABLE_PARALLEL_LOADING environment variable.
  • Native GGUF CUDA kernels support for ~10% inference speed improvement.
  • Support for loading Diffusers format GGUF checkpoints and a conversion tool.
  • Experimental 'Modular Diffusers' system for building pipelines with individual blocks.
  • Massive attention refactor to support multiple backends (SDPA, Flash Attention 3, SAGE).
  • New training scripts for Kontext and Qwen-Image.
  • Single-file modeling implementation for Flux Transformer and Cosmos.

🐛 Bug Fixes

  • Fixed LoRA unloading behavior.
  • Removed unnecessary synchronization before denoising in Kontext.
  • Fixed failing float16 CUDA tests.
  • Adjusted tolerance criteria for float16 inference unit tests on XPU.
  • Removed print statement in SCM Scheduler.
  • Fixed single_file documentation examples.

🔧 Affected Symbols

DiffusionPipelineFluxTransformer2DModelWanPipelineFluxKontextPipelineQwenImagePipelineQwenImageEditPipelinetorch.compilescaled_dot_product_attention

⚡ Deprecations

  • Deprecated pipelines documentation updated (refer to docs for specific list).
  • LoRA deprecation fixes following the 0.34.0 release.