Ollama

v0.30.9-rc22 fixes1 feature

This release introduces support for the Cohere2Moe architecture and resolves several bugs related to LFM2 parsing and token output limitations in agent use cases. Ollama now enforces context window limits on single messages.

v0.30.9-rc1

This release updates the underlying llama.cpp library to commit b9637.

v0.30.9-rc0

v0.30.82 fixes1 feature

This release updates the underlying llama.cpp library to commit b9637.

Jun 12, 2026

This release includes improvements to MLX MTP caching, hardening of mlxrunner layers, and a fix for launch provider drift. Prompt caching logic has also been decoupled from context shifting.

v0.30.71 fix2 features

Jun 7, 2026

This release introduces Hermes Desktop as a new native interface for the Hermes agent, accessible via `ollama launch hermes-desktop`, and aligns the OpenAI-compatible API model listing.

v0.30.7-rc11 fix

Jun 7, 2026

This release includes documentation updates, specifically regarding Zod examples, and fixes the native configuration path for Hermes on Windows.

v0.30.61 feature

Jun 5, 2026

This release includes documentation updates regarding cloud model retirement, an MLX backend enhancement for the embedding layer, and a new integration with Oh My Pi.

v0.30.51 fix

Jun 4, 2026

This release addresses a critical crash related to the gemma4:12b model and includes an integration fix for Hermes on Windows.

v0.30.5-rc01 fix1 feature

Jun 4, 2026

This release introduces documentation for Cline CLI integration and fixes an issue with Hermes installation on Windows, alongside an update to the underlying llama.cpp version.

v0.30.41 fix

This release includes an update to the underlying llama.cpp version and fixes an issue related to cleaning up the llama-server process on Windows.

v0.30.4-rc01 fix

This release updates the underlying llama.cpp version and includes a fix for properly terminating the llama-server process on Windows during cleanup.

v0.30.31 feature

v0.30.2-rc07 fixes3 features

This release introduces support for the new gemma4-12b model within the models module.

v0.30.27 fixes3 features

This release introduces support for Cline CLI and Qwen integration, alongside various stability improvements and fixes related to llama-server and model loading.

v0.30.1-rc04 fixes3 features

This release introduces support for Cline CLI and Qwen integration, alongside numerous stability improvements and fixes related to llama-server and model loading.

Jun 2, 2026

This release introduces new features like Cline CLI support and Qwen code integration, alongside several bug fixes related to model limits, server counts, and markdown handling. It also updates the underlying llama.cpp version.

v0.24.01 fix4 features

May 14, 2026

The OpenAI Codex App is now available on Ollama, integrating local and cloud models for coding workflows. This release also introduces a built-in browser and code review mode, alongside MLX sampler improvements for Apple Silicon.

v0.24.0-rc02 features

May 14, 2026

This release introduces memory trace logging for MLX and integrates Codex application support into the launch mechanism.

v0.24.0-rc12 features

May 14, 2026

This release introduces memory trace logging for MLX and integrates Codex application support via the launch mechanism.

v0.23.41 fix1 feature

v0.23.4-rc01 fix1 feature

This release introduces support for vision models with image inputs when launching opencode via ollama and fixes an issue with Claude tool result formatting for local image paths.

This release introduces support for vision models with image inputs when launching opencode via ollama and fixes an issue with Claude tool result formatting for local image paths.

v0.30.0-rc173 features

v0.30.0-rc31Breaking1 fix3 features

This pre-release updates the architecture to directly support llama.cpp, enabling GGUF compatibility and utilizing MLX for Apple Silicon acceleration. Feedback is requested on performance and memory utilization.

v0.30.0-rc32Breaking1 fix3 features

This release shifts the core architecture to directly support llama.cpp and the GGUF format, introduces MLX acceleration for Apple Silicon, and fixes case handling for the nomic-embed-text model.

v0.30.0-rc23Breaking3 features

This release transitions Ollama's architecture to directly support llama.cpp, enabling GGUF compatibility and MLX acceleration on Apple Silicon. A known issue is that `nomic-embed-text` now enforces lowercase input.

This pre-release version overhauls the architecture to directly support llama.cpp, enabling GGUF compatibility and leveraging MLX for Apple Silicon acceleration. Users are encouraged to provide feedback on performance and stability.

v0.30.0-rc273 features

v0.30.0-rc22Breaking3 features

This pre-release updates the architecture to directly support llama.cpp and the GGUF format, while introducing MLX acceleration for Apple Silicon inference. Feedback is requested on performance and stability.

v0.30.0-rc21Breaking3 features

This pre-release updates the core architecture to directly support llama.cpp, enabling GGUF compatibility and leveraging MLX for Apple Silicon acceleration. Users are encouraged to provide feedback on performance and stability.

v0.30.0Breaking5 features

This pre-release version overhauls the architecture to use llama.cpp directly, enabling GGUF support and leveraging MLX for Apple Silicon acceleration. Users are encouraged to provide feedback on performance and stability.

Ollama 0.30 introduces significant performance and compatibility improvements by integrating llama.cpp, broadening hardware support, and adding new model capabilities.

v0.30.0-rc203 features

v0.30.0-rc15Breaking3 features

This pre-release updates the architecture to directly support llama.cpp and the GGUF format, while introducing MLX acceleration for Apple Silicon inference.

v0.23.2Breaking3 features

This pre-release updates the architecture to directly support llama.cpp, enabling GGUF compatibility and leveraging MLX for Apple Silicon acceleration. Users are encouraged to test performance and stability.

v0.23.32 fixes

May 12, 2026

This release focuses on stability and improvements within the MLX backend, including refined model pushing and fixes for inference timeouts and metallib leakage.

v0.23.3-rc1

May 12, 2026

This release focuses on hardening update flows and refining model push behavior within the MLX integration.

v0.23.2-rc02 features

May 7, 2026

This release focuses on improvements to the Ollama server caching and the desktop launch experience, including plan-aware model gating and disabling Claude Desktop launch.

May 7, 2026

This release introduces significant performance improvements via API response caching and refines the launch integration management workflow. The default behavior of `ollama launch` has been updated regarding Claude Desktop integration.

v0.23.1-rc01 fix1 feature

May 5, 2026

This release introduces Gemma 4 MTP speculative decoding support for Macs, significantly boosting performance for the Gemma 4 31B model on coding tasks, alongside underlying threading fixes and a Go version bump.

v0.23.11 fix1 feature

May 5, 2026

This release introduces Gemma 4 MTP speculative decoding support for Macs, significantly boosting performance for the Gemma 4 31B model, alongside general threading fixes and a Go version bump.

v0.23.02 fixes3 features

May 3, 2026

This release introduces support for Claude Desktop integration with Ollama and enhances stability by fixing Windows gateway timeouts and hardening Metal initialization.

v0.23.0-rc01 fix2 features

May 3, 2026

This release introduces new features like sourcing featured models from an experimental endpoint and adds launch support for the Claude application. It also includes a fix for OpenClaw gateway timeouts on Windows.

v0.22.1-rc02 fixes2 features

v0.22.12 fixes2 features

This release introduces model batching support and fixes several issues related to tokenization and desktop application startup behavior. It also includes support for NVIDIA TensorRT Model Optimizer import.

v0.22.1-rc12 fixes2 features

This release introduces model batching support and TensorRT Model Optimizer import for the mlx backend. It also includes several bug fixes related to tokenization and desktop application startup behavior.

v0.22.0-rc11 fix1 feature

This release introduces model batching support and adds NVIDIA TensorRT Model Optimizer import capability. Several minor bugs related to tokenization and desktop app session handling were also resolved.

This release introduces support for NVIDIA TensorRT Model Optimizer import within mlx and fixes an issue related to multi-regex BPE offset handling in the tokenizer. It also includes performance improvements by batching the sampler across multiple sequences in mlxrunner.

v0.22.02 features

v0.21.3-rc01 fix1 feature

This release introduces two new models: NVIDIA's Nemotron 3 Omni and Poolside's Laguna XS.2.

Apr 24, 2026

This release introduces flexibility in the API by allowing "max" for the think parameter and improves OpenAI response mapping for reasoning effort.

v0.21.2-rc02 features

Apr 23, 2026

This release introduces structured outputs and ollama cloud support, alongside updating the web search mechanism to use bundled OpenClaw.

v0.21.22 features

Apr 23, 2026

This release introduces structured outputs and ollama cloud support, alongside updating the web search mechanism to use bundled OpenClaw.

v0.21.13 fixes1 feature

Apr 22, 2026

This release introduces kimi CLI integration and includes several performance and correctness fixes for MLX models and server formatting logic.

v0.21.1-rc13 fixes1 feature

Apr 22, 2026

This release introduces kimi CLI integration and includes several performance and correctness fixes across MLX models and server formatting logic.

v0.21.02 fixes2 features

Apr 16, 2026

This release introduces Copilot CLI integration and support for the hermes model within the launch command, alongside several fixes to configuration handling during launch.

v0.21.0-rc12 fixes2 features

Apr 16, 2026

This release introduces Copilot CLI integration and support for the hermes model within the launch command, alongside several fixes related to launch configuration handling.

v0.20.8-rc03 fixes3 features

Apr 14, 2026

This release introduces Gemma4 support on the MLX backend and updates the ROCm version to 7.2.1 on Linux. It also includes various fixes and improvements for MLX operations and Gemma4 rendering.

v0.20.71 fix

Apr 13, 2026

This release primarily updates the ROCm dependency to version 7.2.1 on Linux and fixes a quality regression in specific Gemma model configurations.

v0.20.61 fix2 features

Apr 12, 2026

This release focuses on improving Gemma 4 and parallel tool calling capabilities, alongside general application bug fixes and documentation updates for the Hermes Agent.

v0.20.6-rc02 fixes

Apr 10, 2026

This release includes documentation updates, fixes for parallel tool call indexing, and UI adjustments for image attachment validation upon model change. The Gemma4 renderer was also updated.

v0.20.6-rc14 fixes1 feature

Apr 10, 2026

This release introduces documentation for Hermes agent integration and includes several bug fixes related to model parsing, Gemma4 handling, and UI validation upon model switching.

v0.20.5-rc02 fixes2 features

v0.20.5-rc14 fixes2 features

This release introduces setup for openclaw channels and improves command-line interaction, alongside refining safetensors handling and error reporting.

v0.20.5-rc21 fix3 features

This release focuses on improving the command line interface, refining launch configurations for specific models (glm-5.1, gemma4), and enhancing setup for openclaw and opencode integration. It also includes several minor bug fixes across the application.

This release introduces OpenClaw channel setup via `ollama launch openclaw` and enables flash attention and tool call repair for Gemma 4 models. A bug fix was also implemented for the `/save` command.

v0.20.51 fix3 features

v0.20.4-rc21 fix2 features

This release introduces OpenClaw channel setup for integrating messaging platforms like WhatsApp and Telegram via `ollama launch openclaw`, enables flash attention for Gemma 4, and fixes a bug in the /save command.

This release focuses on performance improvements for MLX (M5 with NAX) and Gemma4 (flash attention), alongside minor fixes for model creation.

v0.20.42 features

v0.20.4-rc12 fixes2 features

This release focuses on performance improvements for M5 models via NAX integration and enables flash attention support for gemma4.

This release focuses on performance improvements for MLX (M5 with NAX) and Gemma4 (flash attention), alongside fixes for model creation paths and safetensor loading.

v0.20.31 fix2 features

v0.20.14 fixes1 feature

This release includes improvements to Gemma 4 Tool Calling, adds the latest models to the Ollama App, and fixes issues with launching the OpenClaw TUI.

v0.20.21 feature

Apr 4, 2026

This release updates the default application home view to use the new chat interface by default. It includes minor changes related to the application's user interface.

Apr 3, 2026

This patch release introduces new benchmarking capabilities and resolves several parsing and build issues related to gemma4 and ROCm builds.

v0.20.1-rc24 fixes2 features

Apr 3, 2026

This release introduces performance improvements via flash attention for gemma4 and fixes several parsing and build issues related to argument handling and ROCm compilation.

v0.20.01 fix2 features

Apr 2, 2026

This release introduces the new Gemma 4 model family variants (E2B, E4B, 26B, 31B) and enhances tokenizer capabilities with SentencePiece-style BPE support.

v0.20.0-rc11 fix2 features

Apr 2, 2026

This release introduces the new Gemma 4 model family variants (E2B, E4B, 26B, 31B) and enhances tokenizer capabilities with SentencePiece-style BPE support.

v0.19.04 fixes4 features

v0.19.0-rc23 fixes1 feature

This release introduces improvements to KV cache handling, adds a web search plugin to `ollama launch pi`, and resolves several model loading and parsing bugs across different architectures.

v0.19.0-rc03 fixes2 features

This release introduces a warning for small context lengths and improves launch logic for VS Code integration, alongside various CI and TUI updates.

v0.19.0-rc13 fixes2 features

This release introduces changes to the launch command behavior, improves VS Code path detection, and includes several CI/build hardening updates.