Ollama
AI & LLMsGet up and running with OpenAI gpt-oss, DeepSeek-R1, Gemma 3 and other models.
Release History
v0.30.5-rc01 fix1 featureThis release introduces documentation for Cline CLI integration and fixes an issue with Hermes installation on Windows, alongside an update to the underlying llama.cpp version.
v0.30.51 fixThis release addresses a critical crash related to the gemma4:12b model and includes an integration fix for Hermes on Windows.
v0.30.41 fixThis release includes an update to the underlying llama.cpp version and fixes an issue related to cleaning up the llama-server process on Windows.
v0.30.4-rc01 fixThis release updates the underlying llama.cpp version and includes a fix for properly terminating the llama-server process on Windows during cleanup.
v0.30.31 featureThis release introduces support for the new gemma4-12b model within the models module.
v0.30.2-rc07 fixes3 featuresThis release introduces support for Cline CLI and Qwen integration, alongside various stability improvements and fixes related to llama-server and model loading.
v0.30.27 fixes3 featuresThis release introduces support for Cline CLI and Qwen integration, alongside numerous stability improvements and fixes related to llama-server and model loading.
v0.30.1-rc04 fixes3 featuresThis release introduces new features like Cline CLI support and Qwen code integration, alongside several bug fixes related to model limits, server counts, and markdown handling. It also updates the underlying llama.cpp version.
v0.24.0-rc12 featuresThis release introduces memory trace logging for MLX and integrates Codex application support via the launch mechanism.
v0.24.0-rc02 featuresThis release introduces memory trace logging for MLX and integrates Codex application support into the launch mechanism.
v0.24.01 fix4 featuresThe OpenAI Codex App is now available on Ollama, integrating local and cloud models for coding workflows. This release also introduces a built-in browser and code review mode, alongside MLX sampler improvements for Apple Silicon.
v0.23.4-rc01 fix1 featureThis release introduces support for vision models with image inputs when launching opencode via ollama and fixes an issue with Claude tool result formatting for local image paths.
v0.23.41 fix1 featureThis release introduces support for vision models with image inputs when launching opencode via ollama and fixes an issue with Claude tool result formatting for local image paths.
v0.30.0-rc15Breaking3 featuresThis pre-release updates the architecture to directly support llama.cpp, enabling GGUF compatibility and leveraging MLX for Apple Silicon acceleration. Users are encouraged to test performance and stability.
v0.30.0Breaking5 featuresOllama 0.30 introduces significant performance and compatibility improvements by integrating llama.cpp, broadening hardware support, and adding new model capabilities.
v0.30.0-rc32Breaking1 fix3 featuresThis release transitions Ollama's architecture to directly support llama.cpp, enabling GGUF compatibility and MLX acceleration on Apple Silicon. A known issue is that `nomic-embed-text` now enforces lowercase input.
v0.30.0-rc31Breaking1 fix3 featuresThis release shifts the core architecture to directly support llama.cpp and the GGUF format, introduces MLX acceleration for Apple Silicon, and fixes case handling for the nomic-embed-text model.
v0.30.0-rc273 featuresThis pre-release updates the architecture to directly support llama.cpp and the GGUF format, while introducing MLX acceleration for Apple Silicon inference. Feedback is requested on performance and stability.
v0.30.0-rc23Breaking3 featuresThis pre-release version overhauls the architecture to directly support llama.cpp, enabling GGUF compatibility and leveraging MLX for Apple Silicon acceleration. Users are encouraged to provide feedback on performance and stability.
v0.30.0-rc22Breaking3 featuresThis pre-release updates the core architecture to directly support llama.cpp, enabling GGUF compatibility and leveraging MLX for Apple Silicon acceleration. Users are encouraged to provide feedback on performance and stability.
v0.30.0-rc21Breaking3 featuresThis pre-release version overhauls the architecture to use llama.cpp directly, enabling GGUF support and leveraging MLX for Apple Silicon acceleration. Users are encouraged to provide feedback on performance and stability.
v0.30.0-rc203 featuresThis pre-release updates the architecture to directly support llama.cpp and the GGUF format, while introducing MLX acceleration for Apple Silicon inference.
v0.30.0-rc173 featuresThis pre-release updates the architecture to directly support llama.cpp, enabling GGUF compatibility and utilizing MLX for Apple Silicon acceleration. Feedback is requested on performance and memory utilization.
v0.23.3-rc1This release focuses on hardening update flows and refining model push behavior within the MLX integration.
v0.23.32 fixesThis release focuses on stability and improvements within the MLX backend, including refined model pushing and fixes for inference timeouts and metallib leakage.
v0.23.2-rc02 featuresThis release focuses on improvements to the Ollama server caching and the desktop launch experience, including plan-aware model gating and disabling Claude Desktop launch.
v0.23.2Breaking3 featuresThis release introduces significant performance improvements via API response caching and refines the launch integration management workflow. The default behavior of `ollama launch` has been updated regarding Claude Desktop integration.
v0.23.11 fix1 featureThis release introduces Gemma 4 MTP speculative decoding support for Macs, significantly boosting performance for the Gemma 4 31B model, alongside general threading fixes and a Go version bump.
v0.23.1-rc01 fix1 featureThis release introduces Gemma 4 MTP speculative decoding support for Macs, significantly boosting performance for the Gemma 4 31B model on coding tasks, alongside underlying threading fixes and a Go version bump.
v0.23.0-rc01 fix2 featuresThis release introduces new features like sourcing featured models from an experimental endpoint and adds launch support for the Claude application. It also includes a fix for OpenClaw gateway timeouts on Windows.
v0.23.02 fixes3 featuresThis release introduces support for Claude Desktop integration with Ollama and enhances stability by fixing Windows gateway timeouts and hardening Metal initialization.
v0.22.12 fixes2 featuresThis release introduces model batching support and TensorRT Model Optimizer import for the mlx backend. It also includes several bug fixes related to tokenization and desktop application startup behavior.
v0.22.1-rc02 fixes2 featuresThis release introduces model batching support and fixes several issues related to tokenization and desktop application startup behavior. It also includes support for NVIDIA TensorRT Model Optimizer import.
v0.22.1-rc12 fixes2 featuresThis release introduces model batching support and adds NVIDIA TensorRT Model Optimizer import capability. Several minor bugs related to tokenization and desktop app session handling were also resolved.
v0.22.0-rc11 fix1 featureThis release introduces support for NVIDIA TensorRT Model Optimizer import within mlx and fixes an issue related to multi-regex BPE offset handling in the tokenizer. It also includes performance improvements by batching the sampler across multiple sequences in mlxrunner.
v0.22.02 featuresThis release introduces two new models: NVIDIA's Nemotron 3 Omni and Poolside's Laguna XS.2.
v0.21.3-rc01 fix1 featureThis release introduces flexibility in the API by allowing "max" for the think parameter and improves OpenAI response mapping for reasoning effort.
v0.21.22 featuresThis release introduces structured outputs and ollama cloud support, alongside updating the web search mechanism to use bundled OpenClaw.
v0.21.2-rc02 featuresThis release introduces structured outputs and ollama cloud support, alongside updating the web search mechanism to use bundled OpenClaw.
v0.21.13 fixes1 featureThis release introduces kimi CLI integration and includes several performance and correctness fixes for MLX models and server formatting logic.
v0.21.1-rc13 fixes1 featureThis release introduces kimi CLI integration and includes several performance and correctness fixes across MLX models and server formatting logic.
v0.21.02 fixes2 featuresThis release introduces Copilot CLI integration and support for the hermes model within the launch command, alongside several fixes to configuration handling during launch.
v0.21.0-rc12 fixes2 featuresThis release introduces Copilot CLI integration and support for the hermes model within the launch command, alongside several fixes related to launch configuration handling.
v0.20.8-rc03 fixes3 featuresThis release introduces Gemma4 support on the MLX backend and updates the ROCm version to 7.2.1 on Linux. It also includes various fixes and improvements for MLX operations and Gemma4 rendering.
v0.20.71 fixThis release primarily updates the ROCm dependency to version 7.2.1 on Linux and fixes a quality regression in specific Gemma model configurations.
v0.20.61 fix2 featuresThis release focuses on improving Gemma 4 and parallel tool calling capabilities, alongside general application bug fixes and documentation updates for the Hermes Agent.
v0.20.6-rc14 fixes1 featureThis release introduces documentation for Hermes agent integration and includes several bug fixes related to model parsing, Gemma4 handling, and UI validation upon model switching.
v0.20.6-rc02 fixesThis release includes documentation updates, fixes for parallel tool call indexing, and UI adjustments for image attachment validation upon model change. The Gemma4 renderer was also updated.
v0.20.5-rc14 fixes2 featuresThis release focuses on improving the command line interface, refining launch configurations for specific models (glm-5.1, gemma4), and enhancing setup for openclaw and opencode integration. It also includes several minor bug fixes across the application.
v0.20.51 fix3 featuresThis release introduces OpenClaw channel setup for integrating messaging platforms like WhatsApp and Telegram via `ollama launch openclaw`, enables flash attention for Gemma 4, and fixes a bug in the /save command.
v0.20.5-rc02 fixes2 featuresThis release introduces setup for openclaw channels and improves command-line interaction, alongside refining safetensors handling and error reporting.
v0.20.5-rc21 fix3 featuresThis release introduces OpenClaw channel setup via `ollama launch openclaw` and enables flash attention and tool call repair for Gemma 4 models. A bug fix was also implemented for the `/save` command.
v0.20.4-rc21 fix2 featuresThis release focuses on performance improvements for MLX (M5 with NAX) and Gemma4 (flash attention), alongside minor fixes for model creation.
v0.20.4-rc12 fixes2 featuresThis release focuses on performance improvements for MLX (M5 with NAX) and Gemma4 (flash attention), alongside fixes for model creation paths and safetensor loading.
v0.20.42 featuresThis release focuses on performance improvements for M5 models via NAX integration and enables flash attention support for gemma4.
v0.20.31 fix2 featuresThis release includes improvements to Gemma 4 Tool Calling, adds the latest models to the Ollama App, and fixes issues with launching the OpenClaw TUI.
v0.20.21 featureThis release updates the default application home view to use the new chat interface by default. It includes minor changes related to the application's user interface.
v0.20.14 fixes1 featureThis patch release introduces new benchmarking capabilities and resolves several parsing and build issues related to gemma4 and ROCm builds.
v0.20.1-rc24 fixes2 featuresThis release introduces performance improvements via flash attention for gemma4 and fixes several parsing and build issues related to argument handling and ROCm compilation.
v0.20.01 fix2 featuresThis release introduces the new Gemma 4 model family variants (E2B, E4B, 26B, 31B) and enhances tokenizer capabilities with SentencePiece-style BPE support.
v0.20.0-rc11 fix2 featuresThis release introduces the new Gemma 4 model family variants (E2B, E4B, 26B, 31B) and enhances tokenizer capabilities with SentencePiece-style BPE support.
v0.19.0-rc03 fixes2 featuresThis release introduces changes to the launch command behavior, improves VS Code path detection, and includes several CI/build hardening updates.
v0.19.0-rc13 fixes2 featuresThis release introduces changes to the launch command behavior, updates the TUI title handling, and improves CI build processes for MLX and CUDA.
v0.19.04 fixes4 featuresThis release introduces improvements to KV cache handling, adds a web search plugin to `ollama launch pi`, and resolves several model loading and parsing bugs across different architectures.
v0.19.0-rc23 fixes1 featureThis release introduces a warning for small context lengths and improves launch logic for VS Code integration, alongside various CI and TUI updates.
v0.18.4-rc02 fixesThis release focuses on stability improvements, including fixing a memory leak in mlx and adjusting settings for the Grok model on ggml. It also updates VS Code documentation and hides the VS Code launch option.
v0.18.3-rc13 fixes3 featuresThis release introduces debug request logging and improves MLX performance with better cache sharing and new format imports. Several stability fixes were also implemented across the desktop app, MLX runner, and CI.
v0.18.35 fixes4 featuresThis release introduces debug request logging, improves KV cache sharing in mlxrunner, and fixes several stability issues including desktop app loading hangs and mlxrunner deadlocks.
v0.18.22 fixes2 featuresThis release introduces checks for npm and git installation prerequisites for OpenClaw and significantly speeds up local Claude Code execution. Several minor bugs related to model launching and package registration were also fixed.
v0.18.2-rc01 fix3 featuresThis release introduces significant performance and feature enhancements for the MLX backend, including model eviction, quantized embeddings, and fast SwiGLU. It also includes a fix for the web_search legacy path in the cloud proxy.
v0.18.2-rc11 fix3 featuresThis release introduces significant performance and feature enhancements for MLX backend, including model eviction, quantized embeddings, and fast SwiGLU. It also includes a fix for the web_search legacy path in the cloud proxy.
v0.18.12 fixes1 featureThis release focuses on improving the benchmarking tool and adding stability fixes for launch commands, particularly concerning headless mode and systemd availability.
v0.18.1-rc12 fixes1 featureThis release focuses on improving the benchmarking tool and adding stability fixes for launch commands, particularly concerning headless mode and systemd availability.
v0.18.03 fixes2 featuresThis release introduces documentation for `reasoning_effort` support in the OpenAI-compatible API and includes several fixes related to cloud model handling and launch command integration.
v0.18.0-rc23 fixes1 featureThis release introduces documentation for reasoning_effort support in the OpenAI-compatible API and fixes several issues related to cloud model handling and launch command integration.
v0.17.8-rc48 fixes2 featuresThis release focuses on stability and performance improvements, including fixes for GLM tool calls, localhost handling, and updates to MLX and ROCm support. It also addresses an issue where resetting defaults disabled auto-updates.
v0.17.8-rc15 fixes1 featureThis release focuses on stability and fixes, including repairs to GLM tool parsing, localhost handling, and cloud proxy stream disconnects, alongside build improvements for Windows and MLX.
v0.17.8-rc27 fixes2 featuresThis release focuses on stability and performance improvements, including fixes for GLM tool parsing, localhost handling, and updates to MLX and ROCm support. It also refactors the MLX runner sampler interface.
v0.17.8-rc37 fixes2 featuresThis release focuses on stability and performance improvements across parsers, cloud proxy handling, and MLX backend optimizations. It also includes fixes for Docker builds and application defaults.
v0.17.7-rc22 featuresThis release introduces improvements to thinking level mapping and adds context length support for compaction via `ollama launch`.
v0.17.7-rc02 featuresThis release loosens the server's thinking level constraint and adds support for Qwen 3.5 context length during launch.
v0.17.72 featuresThis release introduces improvements to thinking level mapping and adds context length support for compaction via `ollama launch`.
v0.17.62 fixesThis release focuses on bug fixes, specifically addressing prompt rendering issues for GLM-OCR and improving tool calling for Qwen 3.5 models.
v0.17.54 fixes1 featureThis release focuses on stability and performance improvements for Qwen 3.5 models, particularly when running across multiple devices or using the MLX engine, and introduces peak memory reporting.
v0.17.41 featureThis release introduces the inclusion of tool call indices within parallel tool calls for enhanced tracking and functionality.
v0.17.31 fixThis patch release fixes a bug related to the correct parsing of tool calls for Qwen 3 and Qwen 3.5 models when they are emitted during the thinking process.
v0.17.21 fixThis release addresses a critical bug where the Windows application would crash on startup if an update was pending.
v0.17.1-rc02 fixes3 featuresThis release introduces support for the nemotron architecture and includes several performance and logging improvements, particularly for MLX-based operations.
v0.17.13 fixes3 featuresThis release introduces support for the Nemotron architecture and includes several performance and stability improvements, particularly around MLX memory usage and logging. It also updates the mlx-c bindings.
v0.17.1-rc22 fixes3 featuresThis release introduces support for the nemotron architecture and includes several performance and logging improvements, particularly for MLX-based operations. It also updates underlying MLX-C bindings.
v0.17.1-rc12 fixes3 featuresThis release introduces support for the nemotron architecture and includes several performance and logging improvements, particularly for MLX-based operations. It also updates underlying MLX-C bindings.
v0.17.0-rc12 featuresThis release introduces UI exposure of the server context length and implements OpenClaw onboarding, alongside internal consolidation of the tokenizer.
v0.17.02 featuresThis release introduces automatic installation and configuration of OpenClaw via Ollama, enabling easier use with open models, and enables websearch functionality when using cloud models.
v0.16.33 fixes5 featuresThis release introduces support for several new model architectures (Gemma 3, Llama 3, Qwen 3) in mlxrunner and adds the new `ollama launch` CLI command. Several minor bug fixes related to mlx model display and scheduling were also implemented.
v0.16.22 fixes3 featuresThis release introduces the ability to disable cloud models via a new setting or environment variable, and fixes rendering issues in PowerShell along with bugs affecting experimental image models.
v0.16.2-rc02 fixes2 featuresThis release introduces web search capabilities for Claude cloud models and adds an environment variable to easily disable cloud models for privacy. It also fixes rendering issues in PowerShell and restores functionality for experimental image generation models.
v0.16.12 fixes1 featureThis release improves the installation experience on macOS and Windows and adds support for respecting the OLLAMA_LOAD_TIMEOUT variable for image generation models.
v0.16.01 fix3 featuresOllama 0.16.0-rc2 introduces the powerful GLM-5 model and a new `ollama` command for simplified application launching. It also includes MLX runner improvements and a new keybinding for prompt editing.
v0.16.0-rc21 fix5 featuresThis release introduces significant improvements to the command-line interface (CLI) and Text User Interface (TUI) experience, adds MLX runner support with safetensors quantization, and includes new login/logout aliases.
v0.16.0-rc11 fix11 featuresThis release introduces significant UX improvements across the CLI and TUI, adds new features like external prompt editing and hidden login/logout aliases, and enhances model support with MLX integration and safetensors quantization.
Common Errors
Related AI & LLMs Packages
AutoGPT is the vision of accessible AI for everyone, to use and to build on. Our mission is to provide the tools, so that you can focus on what matters.
🦜🔗 The platform for reliable agents.
The most powerful and modular diffusion model GUI, api and backend with a graph/nodes interface.
LLM inference in C/C++
GPT4All: Run Local LLMs on Any Device. Open-source and available for commercial use.
A high-throughput and memory-efficient inference and serving engine for LLMs
Subscribe to Updates
Get notified when new versions are released