0.47.0
📦 bitsandbytesView on GitHub →
✨ 9 features🐛 8 fixes⚡ 1 deprecations🔧 2 symbols
Summary
This release introduces FSDP2 compatibility for Params4bit and significantly expands hardware support by improving CPU/XPU coverage and adding Volta support to recent CUDA builds. Several bugs related to 4bit quantization and documentation have also been resolved.
Migration Steps
- Review code for any usage of previously deprecated features that have been removed (#1669).
✨ New Features
- FSDP2 compatibility for Params4bit (#1719).
- Improved CPU coverage, enabling CPU/XPU native and ipex path (#1628).
- Include NVIDIA Volta support in CUDA 12.8 and 12.9 builds (#1715).
- Improvement for torch.compile support on Params4bit (#1673).
- HPU (Intel gaudi) support added for bnb unit tests (#1680).
- Enable ROCm backend with custom ops integration (#1683).
- Add CUDA 12.9 build (#1689).
- Automatically call CMake as part of PEP 517 build (#1512).
- Add kernel registration for 8bit and 32bit optimizers (#1706).
🐛 Bug Fixes
- Bugfix for 4bit quantization with large block sizes (#1721).
- Fix CI regression (#1666).
- Fix params4bit passing bnb quantized (#1665).
- Fixed a bug in test_fw_bit_quant testing on CPU (#1675).
- Fix AdamW documentation (#1686).
- Fix log issue (#1697).
- Fix Params4bit tensor subclass handling (#1719).
- Fix quantization uint8 packing bug for NF4 and FP4 on CUDA (#1721).
🔧 Affected Symbols
Params4bitoptimizer.py⚡ Deprecations
- Further removal of previously deprecated code (#1669).