Change8

0.48.0

Breaking Changes
📦 bitsandbytesView on GitHub →
2 breaking9 features🐛 3 fixes🔧 3 symbols

Summary

This release introduces official support for Intel GPUs and Intel Gaudi accelerators, alongside significant performance improvements for CUDA 4bit dequantization kernels and compatibility updates for PyTorch and CUDA versions. Support for PyTorch 2.2 and Maxwell GPUs has been dropped.

⚠️ Breaking Changes

  • Support for PyTorch 2.2 has been dropped; the new minimum requirement is PyTorch 2.3.0.
  • Maxwell GPU support (sm50) has been removed from all CUDA builds.

Migration Steps

  1. If you rely on Maxwell GPUs (sm50), you must migrate to newer NVIDIA hardware as support has been dropped.
  2. For Intel GPU usage, ensure you have PyTorch 2.6.0 or newer with XPU support.
  3. For Intel Gaudi usage, ensure you have Gaudi v1.21 and PyTorch 2.6.0 or newer.

✨ New Features

  • Official support added for Intel GPUs (Arc B-Series, Arc A-Series, Data Center GPU Max Series) on Linux and Windows, including LLM.int8(), QLoRA, and 8bit optimizers (excluding paged optimizer).
  • Official support added for Intel Gaudi2 and Gaudi3 accelerators, including LLM.int8() and QLoRA with NF4 (optimizers not yet implemented).
  • 4bit dequantization kernel improved for NVIDIA CUDA, resulting in noticeable speed improvements for prefill, batch token generation, and training, especially on A100, H100, and B200.
  • Added CUDA 13.0 compatibility across Linux x86-64, Linux aarch64, and Windows x86-64 platforms (hardware support limited to Turing generation and newer).
  • Added support for Thor (SM110) in the Linux aarch64 build.
  • Implemented 4bit quantization for arbitrary nn.Parameter.
  • Implemented 32bit and 8bit optimizers in triton for XPU backend.
  • Added SYCL Kernels for XPU backend.
  • Added function to reverse 4bit weights for HPU.

🐛 Bug Fixes

  • Fixed warpSize deprecation issue for ROCm 7.0.
  • Linear8bitLt now supports device movement after forward().
  • Adjusted 4bit test tolerance on CPU for larger blocksizes.

🔧 Affected Symbols

Maxwell GPU support (sm50)PyTorch 2.2Linear8bitLt