b9735
📦 llama-cppView on GitHub →
✨ 1 features🔧 1 symbols
Summary
This release focuses on performance improvements in ggml by optimizing AMX operations, leading to speedups in quantization benchmarks on Intel Xeon CPUs. It also provides updated pre-built binaries for numerous platforms.
✨ New Features
- ggml: Optimized AMX performance by flattening the partition over n_batch * M to ensure every thread participates in quantization.