v0.30.11-rc1
📦 ollamaView on GitHub →
✨ 6 features🐛 8 fixes🔧 11 symbols
Summary
This release introduces new auto-installation features for models like Claude Code and opencode, alongside numerous stability and performance improvements across GPU handling (Vulkan, CUDA presets) and model loading/generation.
Migration Steps
- Users on Windows with hybrid graphics should verify Vulkan device classification.
- Users relying on specific mmproj offload behavior might need to re-evaluate memory settings due to projector memory sizing.
✨ New Features
- Added thinking capability detection to opencode via launch command.
- Enabled auto-installation of Claude Code via launch command.
- Enabled auto-installation of opencode when missing via launch command.
- Added sm_86 architecture to cuda_v13_windows preset for llama.
- Defaulted qwen2.5vl window attention metadata.
- Aligned server generate endpoint with native chat templates.
🐛 Bug Fixes
- Fixed inverted iGPU/dGPU Vulkan classification on Windows hybrid graphics.
- Unified and tuned speculative decoding in mlxrunner.
- Detected model drift when Codex App UI switches.
- Fixed sizing of mmproj offload by projector memory.
- Preserved generation headroom for shifted prompts.
- Fixed ollama ps double-counting mmap'd weights on partial offload.
- Updated mlx and fixed CUDA JIT packaging.
- Added CC 87 support for CUDA v13 on Jetson.