b9580
📦 llama-cppView on GitHub →
✨ 2 features🔧 3 symbols
Summary
This release introduces significant performance enhancements for Vulkan backends by adding support for the `v_dot2_f32_f16` extension in matrix multiplication and Flash Attention kernels. Several platform builds were disabled pending further updates.
Migration Steps
- If using Vulkan builds, ensure the necessary hardware/drivers support the valve fp16 dot2 extension for optimal performance.
- Review code paths that relied on previous preprocessor branching logic for dot product calculations, as this has been abstracted.
✨ New Features
- Added support for `v_dot2_f32_f16` in Vulkan matrix-matrix multiplication and Flash Attention.
- Introduced abstraction for dot product to reduce preprocessor branching in Vulkan implementation.