b7613

📅 Jan 2, 2026📦 llama-cppView on GitHub →

🐛 1 fixes🔧 2 symbols

Summary

This release optimizes the Metal backend by adjusting the Flash Attention (FA) buffer size to prevent unnecessary memory reallocations.

ggml-metalFA buffer