b9330

📅 May 26, 2026📦 llama-cppView on GitHub →

🐛 1 fixes🔧 6 symbols

Summary

This release corrects the tensor operation tagging for ffn_latent in Nemotron models, resolving a loading issue that negatively impacted performance. Various pre-compiled binaries for different platforms are also provided.

🐛 Bug Fixes

Fixed an issue where ffn_latent was incorrectly tagged as MUL_MAT instead of GGML_OP_MUL for Nemotron models, leading to incorrect weight placement (GPU vs CPU) during loading and performance degradation.

Affected Symbols

ffn_latent ffn_latent_down ffn_latent_up GGML_OP_MUL MUL_MAT ggml_mul_mat