b8197

📅 Mar 4, 2026📦 llama-cppView on GitHub →

✨ 2 features🔧 1 symbols

Summary

This release updates ggml to use std::thread instead of OpenMP for AMX builds, resulting in significant inference speedups at the cost of slightly slower loading. Numerous pre-built binaries for various platforms have also been provided.

✨ New Features

ggml now uses a simple std::thread in AMX builds instead of OpenMP, which generally improves inference performance.
New pre-built binaries available for macOS (Apple Silicon and Intel), Linux (various configurations including Vulkan and ROCm 7.2), Windows (CPU, CUDA 12.4, CUDA 13.1, Vulkan, SYCL, HIP), and openEuler.

Affected Symbols

ggml