b8682
Breaking Changes📦 llama-cppView on GitHub →
⚠ 2 breaking✨ 2 features🐛 2 fixes🔧 3 symbols
Summary
This release introduces Q1_0 1-bit quantization support for the CPU, involving renaming and removing specific quantization variants and fixing related enum issues.
⚠️ Breaking Changes
- The quantization type previously named Q1_0 (group size 32) has been removed.
- The quantization type previously named Q1_0_g128 has been renamed to Q1_0.
Migration Steps
- If you were using the old Q1_0 (group size 32) quantization, you must update your model loading logic to use the new Q1_0 (which corresponds to the old Q1_0_g128).
✨ New Features
- Added Q1_0 1-bit quantization support for CPU.
- Added Q1_0_g128 1-bit quantization support for CPU (which was subsequently renamed to Q1_0).
🐛 Bug Fixes
- Fixed an issue with the Q1_0 LlamaFileType Enum.
- Fixed trailing spaces and added a generic fallback for other backends.