Change8

b8099

📦 llama-cppView on GitHub →
2 features🐛 1 fixes🔧 2 symbols

Summary

This release introduces a significant performance enhancement for llamafile on powerpc by adding an FP16 MMA path for Q4/Q8 matrix multiplications, resulting in 1.5x to 2x speedup for relevant workloads.

✨ New Features

  • Added FP16 MMA path for Q4/Q8 matmul on powerpc architecture within llamafile.
  • Implemented dequantization of Q4/Q8 inputs to FP16 to utilize FP16xFP16->FP32 MMA, removing post-processing overhead.

🐛 Bug Fixes

  • Avoided xvi8ger4pp signed->unsigned bias correction by using FP16 MMA path.

Affected Symbols