b7613
📦 llama-cppView on GitHub →
🐛 1 fixes🔧 2 symbols
Summary
This release optimizes the Metal backend by adjusting the Flash Attention (FA) buffer size to prevent unnecessary memory reallocations.
🐛 Bug Fixes
- metal: adjust extra size for FA buffer to avoid reallocations (#18545)
🔧 Affected Symbols
ggml-metalFA buffer