b9767

📅 Jun 23, 2026📦 llama-cppView on GitHub →

✨ 1 features🐛 1 fixes🔧 2 symbols

Summary

This release enhances MTP inference speed on ggml-webgpu for small batches and includes various pre-compiled binaries for different operating systems and hardware configurations.

✨ New Features

Improved MTP inference performance for small batches in ggml-webgpu by utilizing the mat-vec path.

🐛 Bug Fixes

Added a barrier to the NUM_COLS loop in mul-mat-vec for ggml-webgpu.

Affected Symbols

ggml-webgpu mul-mat-vec