b9279
📦 llama-cppView on GitHub →
✨ 1 features🐛 5 fixes🔧 4 symbols
Summary
This release introduces significant performance improvements to the Vulkan backend by fusing the snake activation sequence into a single kernel. Several internal refinements were made to the fusion logic, including stricter type and dimension checks.
Migration Steps
- If relying on Vulkan snake fusion, ensure broadcast operands (a and inv_b) are GGML_TYPE_F32, as the previous type check was relaxed.
- If using snake activation patterns, be aware that fusion will now be rejected if dimensions ne[2] or ne[3] are greater than 1.
✨ New Features
- Vulkan backend now fuses the 5-operation snake activation sequence (mul, sin, sqr, mul, add) into a single elementwise kernel for improved performance, recognized for audio decoders like BigVGAN and Vocos.
🐛 Bug Fixes
- Tightened `ggml_vk_can_fuse_snake` requirements: now mandates contiguous x and dst tensors, and requires broadcast operands a / inv_b to be tightly packed on the broadcast dim.
- Rejected snake fusion when dimension ne[2] or ne[3] > 1.
- Updated Vulkan shader naming conventions (T/C renamed to ne0/ne1) and push constants to align with standard Vulkan backend naming.
- Refactored C++ side: `ggml_vk_can_fuse_snake` reuses `snake_pattern` constant, broadcast operands a and inv_b are now strictly required to be GGML_TYPE_F32 to match hardcoded float bindings.
- Replaced silent f32 fallback in `ggml_vk_snake_dispatch_fused` with an explicit GGML_TYPE_F32 case and GGML_ABORT on default.