b8315

📅 Mar 13, 2026📦 llama-cppView on GitHub →

✨ 2 features🐛 1 fixes🔧 1 symbols

Summary

This release focuses on Vulkan performance improvements, specifically optimizing SSM_CONV workgroup dispatch for large ubatch sizes and fixing associated performance degradation.

✨ New Features

Optimized SSM_CONV workgroup dispatch for large ubatch sizes by tiling tokens into 2D workgroups (32x16) to reduce launch overhead.
Added a vec4 fast path for nc=4 in SSM_CONV (common d_conv size).

🐛 Bug Fixes

Fixed SSM_CONV PP scaling degradation when using ubatch sizes greater than 512.

Affected Symbols

vulkan:SSM_CONV