b7638
📦 llama-cppView on GitHub →
🐛 1 fixes
Summary
This release addresses a critical bug related to FP16 accumulator overflow on CUDA when using Granite models and provides updated binary distributions for macOS, Linux, Windows, and openEuler.
🐛 Bug Fixes
- Fixed FA FP16 accumulator overflow issue on CUDA builds for Granite models.