b8739

📅 Apr 9, 2026📦 llama-cppView on GitHub →

✨ 3 features🔧 4 symbols

Summary

This release introduces support for the AMD CDNA4 architecture (gfx950) for MI350X/MI355X accelerators, adjusting matrix multiplication paths accordingly. Various pre-compiled binaries for different operating systems and hardware configurations are also provided.

Migration Steps

When compiling for gfx950, note that f32 matmul uses mfma_f32_16x16x4f32 instead of the xf32 variant.
Users targeting ROCm 7.2 on Linux x64 should use the provided binaries compiled for ROCm 7.2.

✨ New Features

Added support for AMD Instinct MI350X/MI355X (gfx950, CDNA4) architecture via HIP.
CDNA4 f32 matmul now routes to mfma_f32_16x16x4f32 as the xf32 variant is unavailable on gfx950.
Included CDNA4 in stream-k kernel dispatch for MMQ operations.

Affected Symbols

vendors/hip.h common.cuh mma.cuh mmq.cuh