b9266
📦 llama-cppView on GitHub →
🐛 2 fixes🔧 3 symbols
Summary
This release addresses a critical null-buffer crash occurring in graph input processing for models with specific attention layer configurations (SWA-only or zero SWA layers). Fixes include adding necessary buffer checks and preventing null dereferences during tensor reuse checks.
🐛 Bug Fixes
- Fixed a null-buffer crash in llm_graph_input_attn_kv_iswa for models with zero non-SWA attention layers (e.g., SWA-only slices of Gemma 4) by adding null/buffer checks before setting input tensors.
- Fixed a potential null-dereference in can_reuse() within llm_graph_input_attn_kv_iswa by skipping ne[0] and kq_mask checks when tensors are unallocated.