b8057
📦 llama-cppView on GitHub →
✨ 2 features🐛 7 fixes🔧 2 symbols
Summary
This release introduces significant performance enhancements to ggml-cpu via a new GEMM microkernel and addresses several low-level implementation details and warnings. It also provides extensive new pre-compiled binaries for various platforms and accelerators.
✨ New Features
- Added a new GEMM microkernel implementation for ggml-cpu acceleration.
- Introduced support for various new binary distributions including specific CUDA (12.4, 13.1), Vulkan, SYCL, and HIP builds for Windows, and specialized builds for openEuler.
🐛 Bug Fixes
- Added a guard for sizeless vector types.
- Fixed an issue where DV % GGML_F32_EPR was not zero.
- Moved memset operations out of loops for potential performance improvement.
- Used RM=4 for arm architecture optimization.
- Converted elements in simd_gemm to int.
- Converted types to size_t to resolve compiler warnings.
- Added pragma to ignore aggressive loop optimizations.