Change8

0.46.0

Breaking Changes
📦 bitsandbytesView on GitHub →
2 breaking6 features🐛 8 fixes7 deprecations🔧 9 symbols

Summary

This release introduces significant improvements for `torch.compile` compatibility with both LLM.int8() and 4bit quantization, alongside a major refactoring to integrate with PyTorch Custom Operators. Support for Python 3.8 and older PyTorch versions has been dropped.

⚠️ Breaking Changes

  • Support for Python 3.8 has been dropped. Users must upgrade to Python 3.9 or newer.
  • Support for PyTorch versions older than 2.2.0 has been dropped. Users must upgrade to PyTorch >= 2.2.0.

Migration Steps

  1. Ensure your Python version is 3.9 or higher.
  2. Ensure your PyTorch version is 2.2.0 or higher. For best `torch.compile` support, PyTorch 2.6+ is recommended.
  3. Review usage of deprecated functions in `bnb.autograd` and `bnb.functional` namespaces and update code accordingly.

✨ New Features

  • Added support for `torch.compile` without graph breaks for LLM.int8() (requires PyTorch 2.4+, PyTorch 2.6+ recommended). Experimental CPU support is included.
  • Added support for `torch.compile` without graph breaks for 4bit quantization (requires PyTorch 2.4+ for `fullgraph=False`, PyTorch 2.8 nightly for `fullgraph=True`).
  • Wheels are now published for CUDA Linux aarch64 (sbsa), targeting Turing generation (sm75) and newer.
  • Refactored library code to integrate better with PyTorch via the `torch.library` and custom ops APIs, enabling better `torch.compile` support.
  • Added simple op implementations for CPU.
  • Added autoloading for backend packages.

🐛 Bug Fixes

  • Fixed `torch.compile` issue for LLM.int8() when threshold=0.
  • Fixed missing CPU library issue.
  • Fixed torch compatibility issue for versions <= 2.4.
  • Fixed typo in __getitem__.
  • Improved CUDA version detection and error handling.
  • Fixed Intel CPU/XPU installation issues.
  • Fixed optimizer backwards compatibility.
  • Fixed issue related to `int8_mm_dequant` being moved from CPU to default backend.

🔧 Affected Symbols

bnb.autograd.get_inverse_transform_indicesbnb.autograd.undo_layoutbnb.functional.create_quantile_mapbnb.functional.estimate_quantilesbnb.functional.get_colrow_absmaxbnb.functional.get_row_absmaxbnb.functional.histogram_scatter_add_2dtorch.compileget_cuda_version_tuple

⚡ Deprecations

  • bnb.autograd.get_inverse_transform_indices() is deprecated.
  • bnb.autograd.undo_layout() is deprecated.
  • bnb.functional.create_quantile_map() is deprecated.
  • bnb.functional.estimate_quantiles() is deprecated.
  • bnb.functional.get_colrow_absmax() is deprecated.
  • bnb.functional.get_row_absmax() is deprecated.
  • bnb.functional.histogram_scatter_add_2d() is deprecated.