Change8

b9279

📦 llama-cppView on GitHub →
1 features🐛 5 fixes🔧 4 symbols

Summary

This release introduces significant performance improvements to the Vulkan backend by fusing the snake activation sequence into a single kernel. Several internal refinements were made to the fusion logic, including stricter type and dimension checks.

Migration Steps

  1. If relying on Vulkan snake fusion, ensure broadcast operands (a and inv_b) are GGML_TYPE_F32, as the previous type check was relaxed.
  2. If using snake activation patterns, be aware that fusion will now be rejected if dimensions ne[2] or ne[3] are greater than 1.

✨ New Features

  • Vulkan backend now fuses the 5-operation snake activation sequence (mul, sin, sqr, mul, add) into a single elementwise kernel for improved performance, recognized for audio decoders like BigVGAN and Vocos.

🐛 Bug Fixes

  • Tightened `ggml_vk_can_fuse_snake` requirements: now mandates contiguous x and dst tensors, and requires broadcast operands a / inv_b to be tightly packed on the broadcast dim.
  • Rejected snake fusion when dimension ne[2] or ne[3] > 1.
  • Updated Vulkan shader naming conventions (T/C renamed to ne0/ne1) and push constants to align with standard Vulkan backend naming.
  • Refactored C++ side: `ggml_vk_can_fuse_snake` reuses `snake_pattern` constant, broadcast operands a and inv_b are now strictly required to be GGML_TYPE_F32 to match hardcoded float bindings.
  • Replaced silent f32 fallback in `ggml_vk_snake_dispatch_fused` with an explicit GGML_TYPE_F32 case and GGML_ABORT on default.

Affected Symbols