Change8

b9075

📦 llama-cppView on GitHub →
2 features🐛 2 fixes🔧 3 symbols

Summary

This release introduces significant performance improvements on CUDA devices by fusing the 5-operation snake activation sequence into a single elementwise kernel. Several minor code cleanups and review feedback adjustments were also incorporated.

✨ New Features

  • Implemented CUDA kernel fusion for the snake activation function (mul, sin, sqr, mul, add) to optimize performance for audio decoders like BigVGAN and Vocos.
  • Added support for F32, F16, and BF16 templates for the fused snake activation kernel.

🐛 Bug Fixes

  • Addressed review feedback regarding type matching in the snake fusion check (ensuring add->type matches x->type).
  • Renamed kernel_snake to snake_kernel to align with upstream conventions.

Affected Symbols