b8099
📦 llama-cppView on GitHub →
✨ 2 features🐛 1 fixes🔧 2 symbols
Summary
This release introduces a significant performance enhancement for llamafile on powerpc by adding an FP16 MMA path for Q4/Q8 matrix multiplications, resulting in 1.5x to 2x speedup for relevant workloads.
✨ New Features
- Added FP16 MMA path for Q4/Q8 matmul on powerpc architecture within llamafile.
- Implemented dequantization of Q4/Q8 inputs to FP16 to utilize FP16xFP16->FP32 MMA, removing post-processing overhead.
🐛 Bug Fixes
- Avoided xvi8ger4pp signed->unsigned bias correction by using FP16 MMA path.