Change8

BitsAndBytes

AI & LLMs

Accessible large language models via k-bit quantization for PyTorch.

Latest: 0.49.216 releases5 breaking changes4 common errorsView on GitHub

Release History

0.49.2Breaking4 fixes3 features
Feb 16, 2026

This release introduces new 4-bit quantization blocksize support for CUDA (32) and ROCm (64), updates the ROCm build to version 7.2, and fixes several bugs related to 8bitoptim/FSDP and tensor handling.

continuous-release_main1 feature
Jan 8, 2026

This pre-release provides the latest development wheels for all supported platforms, automatically rebuilt upon commits to the main branch. Installation requires using specific wheel URLs based on the target operating system and architecture.

0.49.11 fix
Jan 8, 2026

This patch release updates AMD targets and adds a safety guard for the quantization state attribute.

0.49.0Breaking4 fixes5 features
Dec 11, 2025

This release brings significant performance boosts for x86-64 CPUs, introduces experimental ROCm support via PyPI wheels, and adds compatibility for macOS 14+. Support for Python 3.9 and Maxwell GPUs has been dropped.

0.48.22 fixes1 feature
Oct 29, 2025

Version 0.48.2 fixes critical bugs related to quantization indexing and CPU/disk offloading regressions, and introduces Windows build support for SYCL kernels on XPU.

0.48.12 fixes
Oct 2, 2025

Version 0.48.1 addresses a critical regression in LLM.int8() affecting inference with pre-quantized checkpoints and fixes an issue with 8bit parameter device movement.

0.48.0Breaking3 fixes9 features
Sep 30, 2025

This release introduces official support for Intel GPUs and Intel Gaudi accelerators, alongside significant performance improvements for CUDA 4bit dequantization kernels and compatibility updates for PyTorch and CUDA versions. Support for PyTorch 2.2 and Maxwell GPUs has been dropped.

0.47.08 fixes9 features
Aug 11, 2025

This release introduces FSDP2 compatibility for Params4bit and significantly expands hardware support by improving CPU/XPU coverage and adding Volta support to recent CUDA builds. Several bugs related to 4bit quantization and documentation have also been resolved.

0.46.12 fixes1 feature
Jul 2, 2025

This release focuses on improving compatibility with torch.compile for Params4bit and fixing documentation issues, alongside adding support for CUDA 12.9 builds. It also streamlines the build process by automatically calling CMake during PEP 517 builds.

0.46.0Breaking8 fixes6 features
May 27, 2025

This release introduces significant improvements for `torch.compile` compatibility with both LLM.int8() and 4bit quantization, alongside a major refactoring to integrate with PyTorch Custom Operators. Support for Python 3.8 and older PyTorch versions has been dropped.

continuous-release_multi-backend-refactor
May 19, 2025

No release notes provided.

0.45.51 fix1 feature
Apr 7, 2025

This minor release fixes an issue where the CPU build of bitsandbytes was omitted from the v0.45.4 wheels by including it in the v0.45.5 release.

0.45.41 fix
Mar 25, 2025

This minor release focuses on improving CPU-only usage of bitsandbytes, featuring a bug fix and better system compatibility on Linux by adjusting the build environment.

0.45.34 fixes1 feature
Feb 24, 2025

This patch release introduces support for NVIDIA Blackwell GPUs via a new CUDA 12.8 build and includes several minor bug fixes.

0.45.21 fix
Feb 6, 2025

This patch release resolves a RuntimeError that occurred during bitsandbytes import when no GPUs were present alongside Triton in PyTorch 2.6 environments.

0.45.1Breaking2 fixes2 features
Jan 23, 2025

This patch release focuses on dependency compatibility, notably setting the minimum PyTorch version to 2.0.0 and ensuring compatibility with triton>=3.2.0. It also includes build system updates and packaging cleanup.

Common Errors

ModuleNotFoundError4 reports

The "ModuleNotFoundError" in bitsandbytes usually arises from missing or incompatible dependencies, particularly Triton. Ensure you have the correct version of Triton installed that aligns with your bitsandbytes version; try reinstalling bitsandbytes with `pip install --upgrade bitsandbytes` or building from source following the official documentation if encountering Triton-related issues. If using Triton 3.2 or newer, adapt your code to avoid deprecated modules like `triton.ops` as they have been removed.

CalledProcessError4 reports

The "CalledProcessError" in bitsandbytes often arises from CUDA version mismatches or incomplete/incorrect installation of CUDA-related libraries required by bitsandbytes. Ensure your CUDA toolkit version (nvcc --version) matches the bitsandbytes CUDA version (e.g., bitsandbytes_cuda123 needs CUDA 12.3). Reinstall bitsandbytes using `pip uninstall bitsandbytes` followed by a targeted install: `pip install bitsandbytes --prefer-binary --extra-index-url=https://huggingface.github.io/bitsandbytes-wheels/` to fetch pre-compiled wheels compatible with your system.

FileNotFoundError1 report

This error usually means that the bitsandbytes library files weren't installed correctly or are missing from the expected location. Reinstall bitsandbytes using `pip install bitsandbytes --force-reinstall` to ensure all necessary files are downloaded and placed in the correct directory. If problems persist, ensure your CUDA toolkit is correctly installed and accessible, as bitsandbytes relies on it.

NotImplementedError1 report

This error usually arises when a specific bitsandbytes feature, like double quantization or CPU/XPU support, is not yet implemented for your hardware setup (e.g., using ROCm on AMD GPUs) or the chosen device. To fix it, either switch to a supported configuration (like CUDA GPUs if applicable features are missing for ROCm), or wait for future bitsandbytes releases that might include the necessary implementation. Otherwise, remove the specific unsupported code from your implementation, opting for other quantization methods or supported devices.

Related AI & LLMs Packages

Subscribe to Updates

Get notified when new versions are released

RSS Feed