BitsAndBytes

AI & LLMs

Accessible large language models via k-bit quantization for PyTorch.

Latest: 0.49.216 releases5 breaking changes4 common errorsUpdated Feb 16, 2026View on GitHub

Release History

0.49.2Breaking4 fixes3 features

Feb 16, 2026

This release introduces new 4-bit quantization blocksize support for CUDA (32) and ROCm (64), updates the ROCm build to version 7.2, and fixes several bugs related to 8bitoptim/FSDP and tensor handling.

continuous-release_main1 feature

Jan 8, 2026

This pre-release provides the latest development wheels for all supported platforms, automatically rebuilt upon commits to the main branch. Installation requires using specific wheel URLs based on the target operating system and architecture.

0.49.11 fix

Jan 8, 2026

This patch release updates AMD targets and adds a safety guard for the quantization state attribute.

0.49.0Breaking4 fixes5 features

Dec 11, 2025

This release brings significant performance boosts for x86-64 CPUs, introduces experimental ROCm support via PyPI wheels, and adds compatibility for macOS 14+. Support for Python 3.9 and Maxwell GPUs has been dropped.

0.48.22 fixes1 feature

Oct 29, 2025

Version 0.48.2 fixes critical bugs related to quantization indexing and CPU/disk offloading regressions, and introduces Windows build support for SYCL kernels on XPU.

0.48.12 fixes

Oct 2, 2025

Version 0.48.1 addresses a critical regression in LLM.int8() affecting inference with pre-quantized checkpoints and fixes an issue with 8bit parameter device movement.

0.48.0Breaking3 fixes9 features

Sep 30, 2025

This release introduces official support for Intel GPUs and Intel Gaudi accelerators, alongside significant performance improvements for CUDA 4bit dequantization kernels and compatibility updates for PyTorch and CUDA versions. Support for PyTorch 2.2 and Maxwell GPUs has been dropped.

0.47.08 fixes9 features

Aug 11, 2025

This release introduces FSDP2 compatibility for Params4bit and significantly expands hardware support by improving CPU/XPU coverage and adding Volta support to recent CUDA builds. Several bugs related to 4bit quantization and documentation have also been resolved.

0.46.12 fixes1 feature

Jul 2, 2025

This release focuses on improving compatibility with torch.compile for Params4bit and fixing documentation issues, alongside adding support for CUDA 12.9 builds. It also streamlines the build process by automatically calling CMake during PEP 517 builds.

0.46.0Breaking8 fixes6 features

May 27, 2025

This release introduces significant improvements for `torch.compile` compatibility with both LLM.int8() and 4bit quantization, alongside a major refactoring to integrate with PyTorch Custom Operators. Support for Python 3.8 and older PyTorch versions has been dropped.

continuous-release_multi-backend-refactor

May 19, 2025

No release notes provided.

0.45.51 fix1 feature

Apr 7, 2025

This minor release fixes an issue where the CPU build of bitsandbytes was omitted from the v0.45.4 wheels by including it in the v0.45.5 release.

0.45.41 fix

Mar 25, 2025

This minor release focuses on improving CPU-only usage of bitsandbytes, featuring a bug fix and better system compatibility on Linux by adjusting the build environment.

0.45.34 fixes1 feature

Feb 24, 2025

This patch release introduces support for NVIDIA Blackwell GPUs via a new CUDA 12.8 build and includes several minor bug fixes.

0.45.21 fix

Feb 6, 2025

This patch release resolves a RuntimeError that occurred during bitsandbytes import when no GPUs were present alongside Triton in PyTorch 2.6 environments.

0.45.1Breaking2 fixes2 features

Jan 23, 2025

This patch release focuses on dependency compatibility, notably setting the minimum PyTorch version to 2.0.0 and ensuring compatibility with triton>=3.2.0. It also includes build system updates and packaging cleanup.

Common Errors

ModuleNotFoundError4 reports

The "ModuleNotFoundError" in bitsandbytes usually arises from missing or incompatible dependencies, particularly Triton. Ensure you have the correct version of Triton installed that aligns with your bitsandbytes version; try reinstalling bitsandbytes with `pip install --upgrade bitsandbytes` or building from source following the official documentation if encountering Triton-related issues. If using Triton 3.2 or newer, adapt your code to avoid deprecated modules like `triton.ops` as they have been removed.

CalledProcessError4 reports

The "CalledProcessError" in bitsandbytes often arises from CUDA version mismatches or incomplete/incorrect installation of CUDA-related libraries required by bitsandbytes. Ensure your CUDA toolkit version (nvcc --version) matches the bitsandbytes CUDA version (e.g., bitsandbytes_cuda123 needs CUDA 12.3). Reinstall bitsandbytes using `pip uninstall bitsandbytes` followed by a targeted install: `pip install bitsandbytes --prefer-binary --extra-index-url=https://huggingface.github.io/bitsandbytes-wheels/` to fetch pre-compiled wheels compatible with your system.

FileNotFoundError1 report

This error usually means that the bitsandbytes library files weren't installed correctly or are missing from the expected location. Reinstall bitsandbytes using `pip install bitsandbytes --force-reinstall` to ensure all necessary files are downloaded and placed in the correct directory. If problems persist, ensure your CUDA toolkit is correctly installed and accessible, as bitsandbytes relies on it.

NotImplementedError1 report

This error usually arises when a specific bitsandbytes feature, like double quantization or CPU/XPU support, is not yet implemented for your hardware setup (e.g., using ROCm on AMD GPUs) or the chosen device. To fix it, either switch to a supported configuration (like CUDA GPUs if applicable features are missing for ROCm), or wait for future bitsandbytes releases that might include the necessary implementation. Otherwise, remove the specific unsupported code from your implementation, opting for other quantization methods or supported devices.

Related AI & LLMs Packages

AutoGPT

AutoGPT is the vision of accessible AI for everyone, to use and to build on. Our mission is to provide the tools, so that you can focus on what matters.

Ollama

Get up and running with OpenAI gpt-oss, DeepSeek-R1, Gemma 3 and other models.

LangChain

🦜🔗 The platform for reliable agents.

ComfyUI

The most powerful and modular diffusion model GUI, api and backend with a graph/nodes interface.

llama.cpp

LLM inference in C/C++

GPT4All

GPT4All: Run Local LLMs on Any Device. Open-source and available for commercial use.