Change8

b9670

📦 llama-cppView on GitHub →
1 features🐛 1 fixes🔧 2 symbols

Summary

This release fixes critical edge cases related to NVFP4 quantization, specifically adjusting the placement of post-GEMM operations for LORA and bias addition. It also restricts the use of build_ffn for NVFP4 to only supported configurations.

Migration Steps

  1. Review LORA implementation details regarding residuals, as literature suggests LORA happens post-mul but pre-bias add.

✨ New Features

  • Restricted build_ffn for NVFP4 to supported combinations.

🐛 Bug Fixes

  • Fixed and restricted NVFP4 edge-cases in llama-graph by moving post-GEMM MUL required for dequant b4 lora and bias add.

Affected Symbols