b8220
📦 llama-cppView on GitHub →
✨ 3 features🔧 1 symbols
Summary
This release introduces significant performance improvements for CUDA by utilizing shared memory for ssm_conv and fusing several operations. It also enables fp16 support for these optimizations.
✨ New Features
- CUDA implementation now uses shared memory for ssm_conv.
- Fused silu operation with ssm_conv.
- Fused unary operations with multiplication.