Change8

b8220

📦 llama-cppView on GitHub →
3 features🔧 1 symbols

Summary

This release introduces significant performance improvements for CUDA by utilizing shared memory for ssm_conv and fusing several operations. It also enables fp16 support for these optimizations.

✨ New Features

  • CUDA implementation now uses shared memory for ssm_conv.
  • Fused silu operation with ssm_conv.
  • Fused unary operations with multiplication.

Affected Symbols