b9670
📦 llama-cppView on GitHub →
✨ 1 features🐛 1 fixes🔧 2 symbols
Summary
This release fixes critical edge cases related to NVFP4 quantization, specifically adjusting the placement of post-GEMM operations for LORA and bias addition. It also restricts the use of build_ffn for NVFP4 to only supported configurations.
Migration Steps
- Review LORA implementation details regarding residuals, as literature suggests LORA happens post-mul but pre-bias add.
✨ New Features
- Restricted build_ffn for NVFP4 to supported combinations.
🐛 Bug Fixes
- Fixed and restricted NVFP4 edge-cases in llama-graph by moving post-GEMM MUL required for dequant b4 lora and bias add.