Change8

0.49.2

Breaking Changes
📦 bitsandbytesView on GitHub →
1 breaking3 features🐛 4 fixes🔧 4 symbols

Summary

This release introduces new 4-bit quantization blocksize support for CUDA (32) and ROCm (64), updates the ROCm build to version 7.2, and fixes several bugs related to 8bitoptim/FSDP and tensor handling.

⚠️ Breaking Changes

  • The default blocksize for 4bit quantization on ROCm devices has changed from 128 to 64. If you relied on the previous default of 128 for 4bit quantization on ROCm, you may need to explicitly set blocksize=128 in your configuration to maintain previous behavior.

Migration Steps

  1. If using 4bit quantization on ROCm and expecting a blocksize of 128, explicitly set blocksize=128 in your configuration, as the default has changed to 64.

✨ New Features

  • Added CUDA kernel support for 4-bit quantization with blocksize=32.
  • Added blocksize=64 4-bit quantization support for ROCm CDNA (warp64) GPUs.
  • ROCm 7.2 build is now included.

🐛 Bug Fixes

  • Fixed 8bitoptim support when used with FSDP.
  • Fixed XPU 4-bit kernel issues.
  • Fixed AdEMAMix scheduler guard and added state_dict round-trip test.
  • Handled non-contiguous tensors in quantize/dequantize operations.

Affected Symbols