Change8

b8739

📦 llama-cppView on GitHub →
3 features🔧 4 symbols

Summary

This release introduces support for the AMD CDNA4 architecture (gfx950) for MI350X/MI355X accelerators, adjusting matrix multiplication paths accordingly. Various pre-compiled binaries for different operating systems and hardware configurations are also provided.

Migration Steps

  1. When compiling for gfx950, note that f32 matmul uses mfma_f32_16x16x4f32 instead of the xf32 variant.
  2. Users targeting ROCm 7.2 on Linux x64 should use the provided binaries compiled for ROCm 7.2.

✨ New Features

  • Added support for AMD Instinct MI350X/MI355X (gfx950, CDNA4) architecture via HIP.
  • CDNA4 f32 matmul now routes to mfma_f32_16x16x4f32 as the xf32 variant is unavailable on gfx950.
  • Included CDNA4 in stream-k kernel dispatch for MMQ operations.

Affected Symbols