b8099

📅 Feb 19, 2026📦 llama-cppView on GitHub →

✨ 2 features🐛 1 fixes🔧 2 symbols

Summary

This release introduces a significant performance enhancement for llamafile on powerpc by adding an FP16 MMA path for Q4/Q8 matrix multiplications, resulting in 1.5x to 2x speedup for relevant workloads.

✨ New Features

Added FP16 MMA path for Q4/Q8 matmul on powerpc architecture within llamafile.
Implemented dequantization of Q4/Q8 inputs to FP16 to utilize FP16xFP16->FP32 MMA, removing post-processing overhead.

🐛 Bug Fixes

Avoided xvi8ger4pp signed->unsigned bias correction by using FP16 MMA path.

Affected Symbols

llamafile xvi8ger4pp