Change8

b9580

📦 llama-cppView on GitHub →
2 features🔧 3 symbols

Summary

This release introduces significant performance enhancements for Vulkan backends by adding support for the `v_dot2_f32_f16` extension in matrix multiplication and Flash Attention kernels. Several platform builds were disabled pending further updates.

Migration Steps

  1. If using Vulkan builds, ensure the necessary hardware/drivers support the valve fp16 dot2 extension for optimal performance.
  2. Review code paths that relied on previous preprocessor branching logic for dot product calculations, as this has been abstracted.

✨ New Features

  • Added support for `v_dot2_f32_f16` in Vulkan matrix-matrix multiplication and Flash Attention.
  • Introduced abstraction for dot product to reduce preprocessor branching in Vulkan implementation.

Affected Symbols