b8690

📅 Apr 7, 2026📦 llama-cppView on GitHub →

✨ 3 features🔧 4 symbols

Summary

This release introduces support for FA dequantization of Q4_1, Q5_0, Q5_1, and IQ4_NL formats within the Vulkan backend. Various pre-compiled binaries for different operating systems and hardware configurations are provided.

✨ New Features

Added FA dequantize4() implementations for Q4_1, Q5_0, Q5_1, and IQ4_NL in the flash attention base shader.
Registered new dequantize4() implementations in the shader generator and pipeline creation.
Enabled new dequantize4() implementations in the scalar/coopmat1 FA support check.

Affected Symbols

flash attention base shader shader generator pipeline creation scalar/coopmat1 FA support check