b8739
📦 llama-cppView on GitHub →
✨ 3 features🔧 4 symbols
Summary
This release introduces support for the AMD CDNA4 architecture (gfx950) for MI350X/MI355X accelerators, adjusting matrix multiplication paths accordingly. Various pre-compiled binaries for different operating systems and hardware configurations are also provided.
Migration Steps
- When compiling for gfx950, note that f32 matmul uses mfma_f32_16x16x4f32 instead of the xf32 variant.
- Users targeting ROCm 7.2 on Linux x64 should use the provided binaries compiled for ROCm 7.2.
✨ New Features
- Added support for AMD Instinct MI350X/MI355X (gfx950, CDNA4) architecture via HIP.
- CDNA4 f32 matmul now routes to mfma_f32_16x16x4f32 as the xf32 variant is unavailable on gfx950.
- Included CDNA4 in stream-k kernel dispatch for MMQ operations.