b8053

📅 Feb 14, 2026📦 llama-cppView on GitHub →

✨ 4 features🔧 3 symbols

Summary

This release focuses on internal optimizations, primarily targeting the Qwen3Next model graph execution and refining chunking logic by removing redundancy and avoiding mask passing.

Migration Steps

Update names and use prefix to disable CUDA graphs if necessary based on internal changes.

✨ New Features

Optimized Qwen3Next graph execution.
Removed redundant q, g chunking logic.
Implemented changes to avoid passing masks around during processing.
Implemented changes to avoid concatenations during chunking.

Affected Symbols

qwen3next graph q chunking g chunking