Change8

Migrating to llama.cpp b9158

Version b9158 introduces 1 breaking change. This guide details how to update your code.

Released: 5/14/2026

1
Breaking Changes
2
Migration Steps
1
Affected Symbols

⚠️ Check Your Code

If you use any of these symbols, you need to read this guide:

ggml_cuda_mma::data_layout

Breaking Changes

Issue #1

The data layout of accumulators along the attention head dimension is scrambled when using the RDNA3/RDNA4 optimized tile kernel (which uses 32 logical units for FP16 accumulation). This is to enable more efficient transposition. Users relying on the specific layout of accumulators must be aware of this change.

Migration Steps

  1. 1
    If using RDNA3/RDNA4 with FP16 accumulation and head sizes not divisible by 32 (e.g., 80 or 112), the kernel falls back to the regular length of 16 with FP32 accumulation.
  2. 2
    Users must be aware that the accumulator data layout is scrambled when the RDNA3/RDNA4 tile kernel is active due to performance optimizations.

Release Summary

This release introduces RDNA3 support for the CUDA mma FA kernel and includes performance tuning for RDNA3, RDNA4, and CDNA architectures, while noting a change in accumulator data layout for RDNA3/4 optimizations.

Need More Details?

View the full release notes and all changes for llama.cpp b9158.

View Full Changelog