b9840

📅 Jun 29, 2026📦 llama-cppView on GitHub →

✨ 11 features🐛 5 fixes🔧 8 symbols

Summary

This release introduces major support for the DeepSeek V4 model, including conversion, setup, and pro model support. It also includes various internal cleanups, optimizations like enabling graph reuse and Flash Attention, and fixes related to GGUF compatibility and sequence handling.

Migration Steps

Rename usages of 'deepseek-v4-flash' to 'deepseek4'.
Replace usage of 'moe.score_func' with 'expert_gating_func'.
Replace 'ggml_view_3d()' with 'ggml_reshape_3d()'.

✨ New Features

Added support for DeepSeek V4 model conversion.
Added basic setup for DeepSeek V4.
Added llm_graph_input_dsv4 support.
Added save-load state functionality.
Added support for DeepSeek V4 pro model.
Added mechanism for inlining templates based on architecture.
Enabled graph reuse.
Enabled Flash Attention (FA).
Added padding to enable FA.
Support multi-sequence generation.
Enabled partial checkpointing.

🐛 Bug Fixes

Fixed sinkhorn epsilon calculation.
Fixed RoPE implementation.
Fixed llama architecture tests.
Fixed CI pipeline.
Fixed indentation issues.

Affected Symbols

deepseek-v4-flash deepseek4 set_gguf_parameters()moe.score_func expert_gating_func ggml_view_3d()ggml_reshape_3d()llama_model_n_swa