b9452

📅 Jun 1, 2026📦 llama-cppView on GitHub →

✨ 2 features🐛 2 fixes🔧 1 symbols

Summary

This release introduces significant performance improvements for Vulkan backends by optimizing block data loading and subtraction for Q3_K/Q6_K quantizations, especially benefiting Intel BMG and Xe2 architectures. Several platform-specific builds have been disabled.

✨ New Features

Implemented block-load optimization for Q3_K/Q6_K block data on Vulkan, involving subtraction directly on 32-bit integers.
Enabled MMVQ usage for Q quants on Xe2 hardware, aligning with NVIDIA overrides.

🐛 Bug Fixes

Improved performance for Q2_K/Q3_K/Q6_K quantization formats when using MMVQ on Intel BMG.
Forced block loads on Vulkan to improve performance due to mesa's limitations in coalescing back-to-back loads from alternating arrays.

Affected Symbols

vulkan