b9767
📦 llama-cppView on GitHub →
✨ 1 features🐛 1 fixes🔧 2 symbols
Summary
This release enhances MTP inference speed on ggml-webgpu for small batches and includes various pre-compiled binaries for different operating systems and hardware configurations.
✨ New Features
- Improved MTP inference performance for small batches in ggml-webgpu by utilizing the mat-vec path.
🐛 Bug Fixes
- Added a barrier to the NUM_COLS loop in mul-mat-vec for ggml-webgpu.