b8197
📦 llama-cppView on GitHub →
✨ 2 features🔧 1 symbols
Summary
This release updates ggml to use std::thread instead of OpenMP for AMX builds, resulting in significant inference speedups at the cost of slightly slower loading. Numerous pre-built binaries for various platforms have also been provided.
✨ New Features
- ggml now uses a simple std::thread in AMX builds instead of OpenMP, which generally improves inference performance.
- New pre-built binaries available for macOS (Apple Silicon and Intel), Linux (various configurations including Vulkan and ROCm 7.2), Windows (CPU, CUDA 12.4, CUDA 13.1, Vulkan, SYCL, HIP), and openEuler.