Change8

v0.30.11-rc1

📦 ollamaView on GitHub →
6 features🐛 8 fixes🔧 11 symbols

Summary

This release introduces new auto-installation features for models like Claude Code and opencode, alongside numerous stability and performance improvements across GPU handling (Vulkan, CUDA presets) and model loading/generation.

Migration Steps

  1. Users on Windows with hybrid graphics should verify Vulkan device classification.
  2. Users relying on specific mmproj offload behavior might need to re-evaluate memory settings due to projector memory sizing.

✨ New Features

  • Added thinking capability detection to opencode via launch command.
  • Enabled auto-installation of Claude Code via launch command.
  • Enabled auto-installation of opencode when missing via launch command.
  • Added sm_86 architecture to cuda_v13_windows preset for llama.
  • Defaulted qwen2.5vl window attention metadata.
  • Aligned server generate endpoint with native chat templates.

🐛 Bug Fixes

  • Fixed inverted iGPU/dGPU Vulkan classification on Windows hybrid graphics.
  • Unified and tuned speculative decoding in mlxrunner.
  • Detected model drift when Codex App UI switches.
  • Fixed sizing of mmproj offload by projector memory.
  • Preserved generation headroom for shifted prompts.
  • Fixed ollama ps double-counting mmap'd weights on partial offload.
  • Updated mlx and fixed CUDA JIT packaging.
  • Added CC 87 support for CUDA v13 on Jetson.

Affected Symbols