Change8

b9558

📦 llama-cppView on GitHub →
2 features🐛 1 fixes🔧 1 symbols

Summary

This release introduces performance improvements for Vulkan by optimizing B matrix loads and increasing the BK size when enabled. It also includes necessary alignment fixes in the Vulkan implementation.

Migration Steps

  1. If compiling with Vulkan enabled, ensure that the B matrix alignment and stride are multiples of 4 in ggml-vulkan.cpp.

✨ New Features

  • Enabled Vulkan optimization using cm2 decode_vector for mul_mat_id B matrix loads, which allows vec4 loads of B elements.
  • Increased BK to 64 when the Vulkan optimization using cm2 decode_vector is enabled.

🐛 Bug Fixes

  • Ensured B matrix alignment and stride are multiples of 4 in ggml-vulkan.cpp when using the new Vulkan optimization.

Affected Symbols