Change8

b8682

Breaking Changes
📦 llama-cppView on GitHub →
2 breaking2 features🐛 2 fixes🔧 3 symbols

Summary

This release introduces Q1_0 1-bit quantization support for the CPU, involving renaming and removing specific quantization variants and fixing related enum issues.

⚠️ Breaking Changes

  • The quantization type previously named Q1_0 (group size 32) has been removed.
  • The quantization type previously named Q1_0_g128 has been renamed to Q1_0.

Migration Steps

  1. If you were using the old Q1_0 (group size 32) quantization, you must update your model loading logic to use the new Q1_0 (which corresponds to the old Q1_0_g128).

✨ New Features

  • Added Q1_0 1-bit quantization support for CPU.
  • Added Q1_0_g128 1-bit quantization support for CPU (which was subsequently renamed to Q1_0).

🐛 Bug Fixes

  • Fixed an issue with the Q1_0 LlamaFileType Enum.
  • Fixed trailing spaces and added a generic fallback for other backends.

Affected Symbols