b8057

📅 Feb 15, 2026📦 llama-cppView on GitHub →

✨ 2 features🐛 7 fixes🔧 2 symbols

Summary

This release introduces significant performance enhancements to ggml-cpu via a new GEMM microkernel and addresses several low-level implementation details and warnings. It also provides extensive new pre-compiled binaries for various platforms and accelerators.

✨ New Features

Added a new GEMM microkernel implementation for ggml-cpu acceleration.
Introduced support for various new binary distributions including specific CUDA (12.4, 13.1), Vulkan, SYCL, and HIP builds for Windows, and specialized builds for openEuler.

🐛 Bug Fixes

Added a guard for sizeless vector types.
Fixed an issue where DV % GGML_F32_EPR was not zero.
Moved memset operations out of loops for potential performance improvement.
Used RM=4 for arm architecture optimization.
Converted elements in simd_gemm to int.
Converted types to size_t to resolve compiler warnings.
Added pragma to ignore aggressive loop optimizations.

Affected Symbols

ggml-cpu simd_gemm