b9075

📅 May 8, 2026📦 llama-cppView on GitHub →

✨ 2 features🐛 2 fixes🔧 3 symbols

Summary

This release introduces significant performance improvements on CUDA devices by fusing the 5-operation snake activation sequence into a single elementwise kernel. Several minor code cleanups and review feedback adjustments were also incorporated.

✨ New Features

Implemented CUDA kernel fusion for the snake activation function (mul, sin, sqr, mul, add) to optimize performance for audio decoders like BigVGAN and Vocos.
Added support for F32, F16, and BF16 templates for the fused snake activation kernel.

🐛 Bug Fixes

Addressed review feedback regarding type matching in the snake fusion check (ensuring add->type matches x->type).
Renamed kernel_snake to snake_kernel to align with upstream conventions.

Affected Symbols

ggml_cuda_op_snake_fused ggml_cuda_cast kernel_snake