Change8

b8197

📦 llama-cppView on GitHub →
2 features🔧 1 symbols

Summary

This release updates ggml to use std::thread instead of OpenMP for AMX builds, resulting in significant inference speedups at the cost of slightly slower loading. Numerous pre-built binaries for various platforms have also been provided.

✨ New Features

  • ggml now uses a simple std::thread in AMX builds instead of OpenMP, which generally improves inference performance.
  • New pre-built binaries available for macOS (Apple Silicon and Intel), Linux (various configurations including Vulkan and ROCm 7.2), Windows (CPU, CUDA 12.4, CUDA 13.1, Vulkan, SYCL, HIP), and openEuler.

Affected Symbols