Change8

b9840

📦 llama-cppView on GitHub →
11 features🐛 5 fixes🔧 8 symbols

Summary

This release introduces major support for the DeepSeek V4 model, including conversion, setup, and pro model support. It also includes various internal cleanups, optimizations like enabling graph reuse and Flash Attention, and fixes related to GGUF compatibility and sequence handling.

Migration Steps

  1. Rename usages of 'deepseek-v4-flash' to 'deepseek4'.
  2. Replace usage of 'moe.score_func' with 'expert_gating_func'.
  3. Replace 'ggml_view_3d()' with 'ggml_reshape_3d()'.

✨ New Features

  • Added support for DeepSeek V4 model conversion.
  • Added basic setup for DeepSeek V4.
  • Added llm_graph_input_dsv4 support.
  • Added save-load state functionality.
  • Added support for DeepSeek V4 pro model.
  • Added mechanism for inlining templates based on architecture.
  • Enabled graph reuse.
  • Enabled Flash Attention (FA).
  • Added padding to enable FA.
  • Support multi-sequence generation.
  • Enabled partial checkpointing.

🐛 Bug Fixes

  • Fixed sinkhorn epsilon calculation.
  • Fixed RoPE implementation.
  • Fixed llama architecture tests.
  • Fixed CI pipeline.
  • Fixed indentation issues.

Affected Symbols