Change8

b7638

📦 llama-cppView on GitHub →
🐛 1 fixes

Summary

This release addresses a critical bug related to FP16 accumulator overflow on CUDA when using Granite models and provides updated binary distributions for macOS, Linux, Windows, and openEuler.

🐛 Bug Fixes

  • Fixed FA FP16 accumulator overflow issue on CUDA builds for Granite models.