Migrating to llama.cpp b9158
Version b9158 introduces 1 breaking change. This guide details how to update your code.
Released: 5/14/2026
⚠️ Check Your Code
If you use any of these symbols, you need to read this guide:
ggml_cuda_mma::data_layoutBreaking Changes
●Issue #1
The data layout of accumulators along the attention head dimension is scrambled when using the RDNA3/RDNA4 optimized tile kernel (which uses 32 logical units for FP16 accumulation). This is to enable more efficient transposition. Users relying on the specific layout of accumulators must be aware of this change.
Migration Steps
- 1If using RDNA3/RDNA4 with FP16 accumulation and head sizes not divisible by 32 (e.g., 80 or 112), the kernel falls back to the regular length of 16 with FP32 accumulation.
- 2Users must be aware that the accumulator data layout is scrambled when the RDNA3/RDNA4 tile kernel is active due to performance optimizations.
Release Summary
This release introduces RDNA3 support for the CUDA mma FA kernel and includes performance tuning for RDNA3, RDNA4, and CDNA architectures, while noting a change in accumulator data layout for RDNA3/4 optimizations.
Need More Details?
View the full release notes and all changes for llama.cpp b9158.
View Full Changelog