Change8

b8315

📦 llama-cppView on GitHub →
2 features🐛 1 fixes🔧 1 symbols

Summary

This release focuses on Vulkan performance improvements, specifically optimizing SSM_CONV workgroup dispatch for large ubatch sizes and fixing associated performance degradation.

✨ New Features

  • Optimized SSM_CONV workgroup dispatch for large ubatch sizes by tiling tokens into 2D workgroups (32x16) to reduce launch overhead.
  • Added a vec4 fast path for nc=4 in SSM_CONV (common d_conv size).

🐛 Bug Fixes

  • Fixed SSM_CONV PP scaling degradation when using ubatch sizes greater than 512.

Affected Symbols