Change8

b9767

📦 llama-cppView on GitHub →
1 features🐛 1 fixes🔧 2 symbols

Summary

This release enhances MTP inference speed on ggml-webgpu for small batches and includes various pre-compiled binaries for different operating systems and hardware configurations.

✨ New Features

  • Improved MTP inference performance for small batches in ggml-webgpu by utilizing the mat-vec path.

🐛 Bug Fixes

  • Added a barrier to the NUM_COLS loop in mul-mat-vec for ggml-webgpu.

Affected Symbols