Change8

0.49.0

Breaking Changes
📦 bitsandbytesView on GitHub →
2 breaking5 features🐛 4 fixes🔧 1 symbols

Summary

This release brings significant performance boosts for x86-64 CPUs, introduces experimental ROCm support via PyPI wheels, and adds compatibility for macOS 14+. Support for Python 3.9 and Maxwell GPUs has been dropped.

⚠️ Breaking Changes

  • Support for Python 3.9 has been dropped. Users must upgrade to Python 3.10 or newer.
  • Compilation support for Maxwell GPUs in the CUDA backend has been dropped. Users on Maxwell GPUs may need to use older versions or alternative backends.

Migration Steps

  1. Ensure your Python environment is running Python 3.10 or newer.
  2. If you rely on CUDA compilation for Maxwell GPUs, you must use an older version of the library or switch hardware.

✨ New Features

  • Significant performance improvements for 4bit quantization on x86-64 CPUs with AVX512 or AVX512BF16 support via optimized kernel paths.
  • Experimental AMD ROCm support is now included in PyPI wheels on Linux x86-64.
  • Wheels for macOS 14+ are now published, supporting 4bit and 8bit quantization on MPS (with slow implementations currently).
  • Added support for using the default blocksize of 64 for 4bit quantization on RDNA GPUs.
  • Added build support for ROCm 7.1 and specific AMD GPU targets (gfx1150/gfx1151).

🐛 Bug Fixes

  • Fixed an indexing overflow issue for blockwise quantization on AMD devices.
  • Fixed a build error related to "no case matching constant switch condition".
  • Workaround implemented for an AVX512 4bit dequantization accuracy issue when using large blocksize on CPU.
  • Fixed compatibility issue with Python 3.14 when using PyTorch 2.9.

🔧 Affected Symbols

pythonInterface.cpp