b8315
📦 llama-cppView on GitHub →
✨ 2 features🐛 1 fixes🔧 1 symbols
Summary
This release focuses on Vulkan performance improvements, specifically optimizing SSM_CONV workgroup dispatch for large ubatch sizes and fixing associated performance degradation.
✨ New Features
- Optimized SSM_CONV workgroup dispatch for large ubatch sizes by tiling tokens into 2D workgroups (32x16) to reduce launch overhead.
- Added a vec4 fast path for nc=4 in SSM_CONV (common d_conv size).
🐛 Bug Fixes
- Fixed SSM_CONV PP scaling degradation when using ubatch sizes greater than 512.