b8064

Breaking Changes

📅 Feb 15, 2026📦 llama-cppView on GitHub →

⚠ 2 breaking✨ 3 features🐛 1 fixes🔧 3 symbols

Summary

This release focuses heavily on CUDA performance optimizations for iq2xxs/iq2xs/iq3xxs dequantization, including register savings and algorithmic simplification, alongside fixing a type definition issue.

⚠️ Breaking Changes

The type alias "uint" was removed and replaced with "uint32_t" in CUDA code, which will cause compilation errors if "uint" was used directly.
IQ2XXS sum scaling logic was simplified, changing the mathematical expression from `(sum * scale + sum / 2) / 4` to `(sum * (scale * 2 + 1)) / 8` and `((aux32 >> 28) * 2 + 1)` to `(aux32 >> 27 | 1)`. This is an internal implementation change but could affect any code relying on the exact intermediate calculations if it was inspecting them.

Migration Steps

If you were using the type alias "uint" in CUDA related code, replace all instances with "uint32_t".

✨ New Features

Optimized dequantization for iq2xxs, iq2xs, and iq3xxs formats on CUDA by loading all 8 int8 values for a grid position at once.
Implemented sign calculation via popcount instead of fetching from the ksigns table in CUDA dequantization.
Simplified sum scaling for iq2xxs in CUDA, saving 3 registers in mul_mat_vec_q.

🐛 Bug Fixes

Fixed compilation error caused by the undefined identifier "uint" by replacing it with "uint32_t".

Affected Symbols

cuda dequantization kernels (iq2xxs/iq2xs/iq3xxs)mul_mat_vec_q uint