b9856

📅 Jul 1, 2026📦 llama-cppView on GitHub →

✨ 1 features

Summary

This release introduces internal performance improvements to the CUDA backend by standardizing the use of __restrict__ and PDL for Fast Attention. It also provides extensive pre-compiled binaries for diverse operating systems and hardware configurations.

✨ New Features

CUDA implementation now consistently uses __restrict__ and PDL for FA (Fast Attention).