v0.30.11-rc0
📦 ollamaView on GitHub →
✨ 5 features🐛 11 fixes🔧 6 symbols
Summary
This release introduces new auto-installation features for models like Claude Code and opencode, alongside numerous stability and performance improvements across Vulkan, MLX, and model loading mechanisms. Key fixes include correcting GPU classification on Windows and improving speculative decoding.
Migration Steps
- Update llama.cpp dependency to the latest version included in this release.
✨ New Features
- Added thinking capability detection to opencode via launch.
- Auto-installation of Claude Code model via launch.
- Auto-installation of opencode when missing via launch.
- Default Qwen2.5VL window attention metadata.
- Redesigned documentation landing and integrations overview.
🐛 Bug Fixes
- Fixed inverted iGPU/dGPU Vulkan classification on Windows hybrid graphics.
- Unified and tuned speculative decoding in mlxrunner.
- Detected model drift when Codex App UI switches.
- Added sm_86 architecture to cuda_v13_windows preset.
- Sized mmproj offload by projector memory.
- Preserved generation headroom for shifted prompts.
- Used host Vulkan loader on Windows.
- Fixed CUDA JIT packaging in mlx.
- Fixed ollama ps double-counting mmap'd weights on partial offload.
- Aligned server generate endpoint with native chat templates.
- Added CC 87 support for CUDA v13 on Jetson.