Change8

b9856

📦 llama-cppView on GitHub →
1 features

Summary

This release introduces internal performance improvements to the CUDA backend by standardizing the use of __restrict__ and PDL for Fast Attention. It also provides extensive pre-compiled binaries for diverse operating systems and hardware configurations.

✨ New Features

  • CUDA implementation now consistently uses __restrict__ and PDL for FA (Fast Attention).