Change8

0.47.0

📦 bitsandbytesView on GitHub →
9 features🐛 8 fixes1 deprecations🔧 2 symbols

Summary

This release introduces FSDP2 compatibility for Params4bit and significantly expands hardware support by improving CPU/XPU coverage and adding Volta support to recent CUDA builds. Several bugs related to 4bit quantization and documentation have also been resolved.

Migration Steps

  1. Review code for any usage of previously deprecated features that have been removed (#1669).

✨ New Features

  • FSDP2 compatibility for Params4bit (#1719).
  • Improved CPU coverage, enabling CPU/XPU native and ipex path (#1628).
  • Include NVIDIA Volta support in CUDA 12.8 and 12.9 builds (#1715).
  • Improvement for torch.compile support on Params4bit (#1673).
  • HPU (Intel gaudi) support added for bnb unit tests (#1680).
  • Enable ROCm backend with custom ops integration (#1683).
  • Add CUDA 12.9 build (#1689).
  • Automatically call CMake as part of PEP 517 build (#1512).
  • Add kernel registration for 8bit and 32bit optimizers (#1706).

🐛 Bug Fixes

  • Bugfix for 4bit quantization with large block sizes (#1721).
  • Fix CI regression (#1666).
  • Fix params4bit passing bnb quantized (#1665).
  • Fixed a bug in test_fw_bit_quant testing on CPU (#1675).
  • Fix AdamW documentation (#1686).
  • Fix log issue (#1697).
  • Fix Params4bit tensor subclass handling (#1719).
  • Fix quantization uint8 packing bug for NF4 and FP4 on CUDA (#1721).

🔧 Affected Symbols

Params4bitoptimizer.py

⚡ Deprecations

  • Further removal of previously deprecated code (#1669).