b9856
📦 llama-cppView on GitHub →
✨ 1 features
Summary
This release introduces internal performance improvements to the CUDA backend by standardizing the use of __restrict__ and PDL for Fast Attention. It also provides extensive pre-compiled binaries for diverse operating systems and hardware configurations.
✨ New Features
- CUDA implementation now consistently uses __restrict__ and PDL for FA (Fast Attention).