b8690
📦 llama-cppView on GitHub →
✨ 3 features🔧 4 symbols
Summary
This release introduces support for FA dequantization of Q4_1, Q5_0, Q5_1, and IQ4_NL formats within the Vulkan backend. Various pre-compiled binaries for different operating systems and hardware configurations are provided.
✨ New Features
- Added FA dequantize4() implementations for Q4_1, Q5_0, Q5_1, and IQ4_NL in the flash attention base shader.
- Registered new dequantize4() implementations in the shader generator and pipeline creation.
- Enabled new dequantize4() implementations in the scalar/coopmat1 FA support check.