Change8

v0.30.11

📦 ollamaView on GitHub →
6 features🐛 9 fixes🔧 9 symbols

Summary

This release introduces new auto-installation features for models like Claude Code and opencode, alongside numerous performance and stability fixes across Vulkan, speculative decoding, and model offloading mechanisms. It also updates the underlying llama.cpp dependency.

Migration Steps

  1. Update the underlying llama.cpp dependency to the latest version included in this release.

✨ New Features

  • Added thinking capability detection to opencode via launch.
  • Enabled auto-installation of Claude Code via launch.
  • Enabled auto-installation of opencode when missing via launch.
  • Added sm_86 architecture to cuda_v13_windows preset for llama.
  • Defaulted qwen2.5vl window attention metadata.
  • Aligned server generate endpoint with native chat templates.

🐛 Bug Fixes

  • Fixed inverted iGPU/dGPU Vulkan classification on Windows hybrid graphics.
  • Unified and tuned speculative decoding in mlxrunner.
  • Detected model drift when Codex App UI switches.
  • Sized mmproj offload correctly by projector memory.
  • Preserved generation headroom for shifted prompts.
  • Used host Vulkan loader on Windows.
  • Fixed CUDA JIT packaging in mlx.
  • Fixed ollama ps double-counting mmap'd weights on partial offload.
  • Added CC 87 support for CUDA v13 on Jetson.

Affected Symbols