Change8

b7613

📦 llama-cppView on GitHub →
🐛 1 fixes🔧 2 symbols

Summary

This release optimizes the Metal backend by adjusting the Flash Attention (FA) buffer size to prevent unnecessary memory reallocations.

🐛 Bug Fixes

  • metal: adjust extra size for FA buffer to avoid reallocations (#18545)

🔧 Affected Symbols

ggml-metalFA buffer