Change8

b8690

📦 llama-cppView on GitHub →
3 features🔧 4 symbols

Summary

This release introduces support for FA dequantization of Q4_1, Q5_0, Q5_1, and IQ4_NL formats within the Vulkan backend. Various pre-compiled binaries for different operating systems and hardware configurations are provided.

✨ New Features

  • Added FA dequantize4() implementations for Q4_1, Q5_0, Q5_1, and IQ4_NL in the flash attention base shader.
  • Registered new dequantize4() implementations in the shader generator and pipeline creation.
  • Enabled new dequantize4() implementations in the scalar/coopmat1 FA support check.

Affected Symbols