Change8

b8530

📦 llama-cppView on GitHub →
6 features🐛 12 fixes3 deprecations🔧 11 symbols

Summary

This release introduces significant support for DeepSeek-OCR within the mtmd framework, alongside numerous fixes related to vision model processing, RoPE types, and tensor handling. Several deprecated CLI arguments and internal resize algorithms related to OCR preprocessing were removed.

Migration Steps

  1. If you were using `--dsocr-mode`, resolution control is now handled differently via dynamic resolution preprocessing simplification.
  2. Update model conversion configurations to correct `n_patches` and other settings for DeepSeek-OCR.
  3. If relying on specific resize algorithms, note that `RESIZE_ALGO_BICUBIC_PILLOW` has been removed/simplified.

✨ New Features

  • Added DeepSeek-OCR support to `mtmd` (llama.cpp integration).
  • Implemented DeepSeek-OCR CLIP-ViT model support.
  • Added support for DeepSeek-OCR LM with standard attention.
  • Added native resolution support for DeepSeek-OCR.
  • Added support for combined QKV projection in `build_vit`.
  • Enabled usage of `--flash-attn` option when flash-attn is disabled via code branch correction.

🐛 Bug Fixes

  • Fixed RoPE type for DeepSeek-OCR LM.
  • Corrected `cls_embd` concatenation in clip-vit.
  • Corrected clip-vit model convert qkv_proj split.
  • Corrected combining of image encoders' results.
  • Fixed update callback for `ffn_moe_weighted` and added callback for `attn_out` in deepseek2 model.
  • Corrected token order issues in `mtmd`.
  • Fixed dangling pointer in `mtmd`.
  • Fixed `get_rel_pos` implementation and scaler.
  • Fixed tensor names for image newlines and view separator.
  • Fixed bad OCR check in Deepseek2 (LM).
  • Fixed test-1.jpg OCR issue with small (640) resolution by setting min-resolution base (1024) max large (1280) for dynamic-resolution.
  • Fixed instabilities issues by reintroducing `resize_bicubic_pillow`.

Affected Symbols

⚡ Deprecations

  • Removed `--dsocr-mode` CLI argument for DeepSeek-OCR resolution control.
  • Removed redundant `RESIZE_ALGO_BICUBIC_PILLOW` resize algorithm.
  • Removed `clip_is_deepseekocr` flag.