b9510

📅 Jun 4, 2026📦 llama-cppView on GitHub →

✨ 2 features🔧 2 symbols

Summary

This release introduces significant performance improvements to GGML by vectorizing the ggml_vec_dot_q4_1_q8_1 operation using WASM SIMD128 intrinsics, achieving a 3.42x speedup in benchmarks.

Migration Steps

The WASM SIMD128 implementation for ggml_vec_dot_q4_1_q8_1 has been moved to a new architecture-specific location in the source tree.

✨ New Features

Vectorized the inner loop of ggml_vec_dot_q4_1_q8_1 using WASM SIMD128 intrinsics, resulting in a 3.42x speedup on benchmarks.
Relocated the WASM SIMD128 implementation of ggml_vec_dot_q4_1_q8_1 to ggml/src/ggml-cpu/arch/wasm/quants.c.

Affected Symbols

ggml_vec_dot_q4_1_q8_1 ggml_vec_dot_q4_1_q8_1_generic