Change8

b8053

📦 llama-cppView on GitHub →
4 features🔧 3 symbols

Summary

This release focuses on internal optimizations, primarily targeting the Qwen3Next model graph execution and refining chunking logic by removing redundancy and avoiding mask passing.

Migration Steps

  1. Update names and use prefix to disable CUDA graphs if necessary based on internal changes.

✨ New Features

  • Optimized Qwen3Next graph execution.
  • Removed redundant q, g chunking logic.
  • Implemented changes to avoid passing masks around during processing.
  • Implemented changes to avoid concatenations during chunking.

Affected Symbols