b8053
📦 llama-cppView on GitHub →
✨ 4 features🔧 3 symbols
Summary
This release focuses on internal optimizations, primarily targeting the Qwen3Next model graph execution and refining chunking logic by removing redundancy and avoiding mask passing.
Migration Steps
- Update names and use prefix to disable CUDA graphs if necessary based on internal changes.
✨ New Features
- Optimized Qwen3Next graph execution.
- Removed redundant q, g chunking logic.
- Implemented changes to avoid passing masks around during processing.
- Implemented changes to avoid concatenations during chunking.