0.49.0
Breaking Changes📦 bitsandbytesView on GitHub →
⚠ 2 breaking✨ 5 features🐛 4 fixes🔧 1 symbols
Summary
This release brings significant performance boosts for x86-64 CPUs, introduces experimental ROCm support via PyPI wheels, and adds compatibility for macOS 14+. Support for Python 3.9 and Maxwell GPUs has been dropped.
⚠️ Breaking Changes
- Support for Python 3.9 has been dropped. Users must upgrade to Python 3.10 or newer.
- Compilation support for Maxwell GPUs in the CUDA backend has been dropped. Users on Maxwell GPUs may need to use older versions or alternative backends.
Migration Steps
- Ensure your Python environment is running Python 3.10 or newer.
- If you rely on CUDA compilation for Maxwell GPUs, you must use an older version of the library or switch hardware.
✨ New Features
- Significant performance improvements for 4bit quantization on x86-64 CPUs with AVX512 or AVX512BF16 support via optimized kernel paths.
- Experimental AMD ROCm support is now included in PyPI wheels on Linux x86-64.
- Wheels for macOS 14+ are now published, supporting 4bit and 8bit quantization on MPS (with slow implementations currently).
- Added support for using the default blocksize of 64 for 4bit quantization on RDNA GPUs.
- Added build support for ROCm 7.1 and specific AMD GPU targets (gfx1150/gfx1151).
🐛 Bug Fixes
- Fixed an indexing overflow issue for blockwise quantization on AMD devices.
- Fixed a build error related to "no case matching constant switch condition".
- Workaround implemented for an AVX512 4bit dequantization accuracy issue when using large blocksize on CPU.
- Fixed compatibility issue with Python 3.14 when using PyTorch 2.9.
🔧 Affected Symbols
pythonInterface.cpp