b9670

📅 Jun 16, 2026📦 llama-cppView on GitHub →

✨ 1 features🐛 1 fixes🔧 2 symbols

Summary

This release fixes critical edge cases related to NVFP4 quantization, specifically adjusting the placement of post-GEMM operations for LORA and bias addition. It also restricts the use of build_ffn for NVFP4 to only supported configurations.

Migration Steps

Review LORA implementation details regarding residuals, as literature suggests LORA happens post-mul but pre-bias add.

✨ New Features

Restricted build_ffn for NVFP4 to supported combinations.

🐛 Bug Fixes

Fixed and restricted NVFP4 edge-cases in llama-graph by moving post-GEMM MUL required for dequant b4 lora and bias add.

Affected Symbols

build_ffn NVFP4