b9266

📅 May 21, 2026📦 llama-cppView on GitHub →

🐛 2 fixes🔧 3 symbols

Summary

This release addresses a critical null-buffer crash occurring in graph input processing for models with specific attention layer configurations (SWA-only or zero SWA layers). Fixes include adding necessary buffer checks and preventing null dereferences during tensor reuse checks.

🐛 Bug Fixes

Fixed a null-buffer crash in llm_graph_input_attn_kv_iswa for models with zero non-SWA attention layers (e.g., SWA-only slices of Gemma 4) by adding null/buffer checks before setting input tensors.
Fixed a potential null-dereference in can_reuse() within llm_graph_input_attn_kv_iswa by skipping ne[0] and kq_mask checks when tensors are unallocated.

Affected Symbols

llm_graph_input_attn_kv_iswa ggml-backend.cpp:194 llm_graph_input_mem_hybrid_iswa::set_input