Change8

b9510

📦 llama-cppView on GitHub →
2 features🔧 2 symbols

Summary

This release introduces significant performance improvements to GGML by vectorizing the ggml_vec_dot_q4_1_q8_1 operation using WASM SIMD128 intrinsics, achieving a 3.42x speedup in benchmarks.

Migration Steps

  1. The WASM SIMD128 implementation for ggml_vec_dot_q4_1_q8_1 has been moved to a new architecture-specific location in the source tree.

✨ New Features

  • Vectorized the inner loop of ggml_vec_dot_q4_1_q8_1 using WASM SIMD128 intrinsics, resulting in a 3.42x speedup on benchmarks.
  • Relocated the WASM SIMD128 implementation of ggml_vec_dot_q4_1_q8_1 to ggml/src/ggml-cpu/arch/wasm/quants.c.

Affected Symbols