Ollama
AI & LLMsGet up and running with OpenAI gpt-oss, DeepSeek-R1, Gemma 3 and other models.
Release History
v0.17.41 featureThis release introduces the inclusion of tool call indices within parallel tool calls for enhanced tracking and functionality.
v0.17.31 fixThis patch release fixes a bug related to the correct parsing of tool calls for Qwen 3 and Qwen 3.5 models when they are emitted during the thinking process.
v0.17.21 fixThis release addresses a critical bug where the Windows application would crash on startup if an update was pending.
v0.17.13 fixes3 featuresThis release introduces support for the Nemotron architecture and includes several performance and stability improvements, particularly around MLX memory usage and logging. It also updates the mlx-c bindings.
v0.17.1-rc02 fixes3 featuresThis release introduces support for the nemotron architecture and includes several performance and logging improvements, particularly for MLX-based operations.
v0.17.1-rc12 fixes3 featuresThis release introduces support for the nemotron architecture and includes several performance and logging improvements, particularly for MLX-based operations. It also updates underlying MLX-C bindings.
v0.17.1-rc22 fixes3 featuresThis release introduces support for the nemotron architecture and includes several performance and logging improvements, particularly for MLX-based operations. It also updates underlying MLX-C bindings.
v0.17.02 featuresThis release introduces automatic installation and configuration of OpenClaw via Ollama, enabling easier use with open models, and enables websearch functionality when using cloud models.
v0.17.0-rc12 featuresThis release introduces UI exposure of the server context length and implements OpenClaw onboarding, alongside internal consolidation of the tokenizer.
v0.16.33 fixes5 featuresThis release introduces support for several new model architectures (Gemma 3, Llama 3, Qwen 3) in mlxrunner and adds the new `ollama launch` CLI command. Several minor bug fixes related to mlx model display and scheduling were also implemented.
v0.16.22 fixes3 featuresThis release introduces the ability to disable cloud models via a new setting or environment variable, and fixes rendering issues in PowerShell along with bugs affecting experimental image models.
v0.16.2-rc02 fixes2 featuresThis release introduces web search capabilities for Claude cloud models and adds an environment variable to easily disable cloud models for privacy. It also fixes rendering issues in PowerShell and restores functionality for experimental image generation models.
v0.16.12 fixes1 featureThis release improves the installation experience on macOS and Windows and adds support for respecting the OLLAMA_LOAD_TIMEOUT variable for image generation models.
v0.16.0-rc11 fix11 featuresThis release introduces significant UX improvements across the CLI and TUI, adds new features like external prompt editing and hidden login/logout aliases, and enhances model support with MLX integration and safetensors quantization.
v0.16.0-rc21 fix5 featuresThis release introduces significant improvements to the command-line interface (CLI) and Text User Interface (TUI) experience, adds MLX runner support with safetensors quantization, and includes new login/logout aliases.
v0.16.01 fix3 featuresOllama 0.16.0-rc2 introduces the powerful GLM-5 model and a new `ollama` command for simplified application launching. It also includes MLX runner improvements and a new keybinding for prompt editing.
v0.15.62 fixes1 featureThis release improves the launch experience by automatically downloading missing models and fixes context handling bugs for droid and claude commands.
v0.15.5-rc53 featuresThis release introduces two new models, GLM-OCR and Qwen3-Coder-Next, and enhances core functionality with sub-agent support and VRAM-aware context length defaulting.
v0.15.52 fixes8 featuresThis release introduces two new powerful models, GLM-OCR and Qwen3-Coder-Next, and significantly enhances `ollama launch` with argument passing and sub-agent support. It also implements VRAM-based dynamic context length setting.
v0.15.5-rc03 featuresThis release introduces the GLM-OCR model and updates default context sizes based on VRAM availability. It also adds support for GLM-4.7-Flash on the MLX engine.
v0.15.5-rc12 featuresThis release introduces the GLM-OCR model and adds support for GLM-4.7-Flash on the MLX engine, while also updating default context sizes based on VRAM.
v0.15.5-rc25 featuresThis release introduces two new models, GLM-OCR and Qwen3-Coder-Next, and enhances core functionality with sub-agent support and VRAM-aware context length defaulting.
v0.15.5-rc35 featuresThis release introduces two new models, GLM-OCR and Qwen3-Coder-Next, and enhances functionality with sub-agent support for `ollama launch` and VRAM-aware default context length settings.
v0.15.5-rc45 featuresThis release introduces two new models, GLM-OCR and Qwen3-Coder-Next, and enhances functionality with sub-agent support for launch commands and VRAM-aware context length defaulting.
v0.15.41 featureThis release updates the behavior of the `ollama launch openclaw` command to ensure the OpenClaw onboarding flow is executed if necessary.
v0.15.31 featureThis release renames the 'clawdbot' launch command to 'openclaw' and updates how `ollama launch` utilizes the OLLAMA_HOST environment variable. Tool calling for Ministral models has also been improved.
v0.15.21 featureThis release introduces a new command for easily launching Clawdbot integrated with Ollama models.
v0.15.11 fix1 featureThis release includes documentation updates, notably regarding 'ollama launch', and a performance fix by adding -O3 optimization to CGO flags.
v0.15.1-rc01 fix1 featureThis release includes documentation updates, notably regarding 'ollama launch', and a performance fix by adding -O3 optimization to CGO flags.
v0.15.01 fix2 featuresThis release introduces the `ollama config` command and adds image editing capabilities to x/imagegen. It also includes improvements to the CLI handling of model loading and output rendering.
v0.15.0-rc31 fix2 featuresThis release introduces the `ollama config` command and adds image editing capabilities to x/imagegen. It also includes improvements to CLI handling during model loading.
v0.15.0-rc1This release focuses on internal cleanup of the manifest and model paths, and temporarily removes the qwen_image and qwen_image_edit models from x/imagegen.
v0.15.0-rc0This release focuses on internal cleanup, specifically refining the manifest and modelpath handling, and removing the qwen_image and qwen_image_edit models from x/imagegen.
v0.14.3-rc11 featureThis release introduces an enhancement for the macOS application, allowing it to terminate gracefully during system shutdown.
v0.14.35 fixes5 featuresThis release introduces several new powerful image generation and LLM models, including Z-Image Turbo and GLM-4.7-Flash. It also includes several bug fixes related to macOS shutdown, model management, and API usage.
v0.14.3-rc35 fixes2 featuresThis release introduces the GLM-4.7-Flash model and enables image generation via the /api/generate endpoint, alongside several stability and command fixes.
v0.14.3-rc01 featureThis release introduces an enhancement for the macOS application to handle system shutdown more gracefully. It primarily focuses on improving application termination behavior on macOS.
v0.14.3-rc25 fixes2 featuresThis release introduces the GLM-4.7-Flash model and enables image generation via the /api/generate endpoint, alongside several stability and command fixes.
v0.14.2-rc11 featureThis release focuses on documentation updates, including integrations for Onyx and Marimo, and introduces multi-line input support in the CLI.
v0.14.21 featureThis release focuses on documentation updates, including new integrations for Onyx and Marimo, and introduces multi-line input support in the CLI.
v0.14.11 fixThis patch release addresses a critical bug affecting macOS auto-updates by fixing signature verification failures. It also welcomes two new contributors.
v0.14.04 fixes5 featuresThis release introduces experimental support for image generation models and enhances API compatibility with Anthropic's message format. It also includes stability improvements for VRAM estimation and introduces the `REQUIRES` command for Modelfiles.
v0.14.0-rc2v0.14.0-rc33 fixes5 featuresThis release introduces experimental support for image generation models via MLX and enhances Anthropic API compatibility. It also adds model version requirements via the Modelfile and improves VRAM measurement accuracy.
v0.14.0-rc44 fixes5 featuresThis release introduces experimental support for image generation models, adds Anthropic API compatibility, and includes several stability improvements related to VRAM estimation and error handling.
v0.14.0-rc74 fixes5 featuresThis release introduces experimental support for image generation models via MLX, adds Anthropic API compatibility, and includes several stability improvements related to VRAM handling and error reporting.
v0.14.0-rc84 fixes5 featuresThis release introduces experimental support for image generation models, adds Anthropic API compatibility, and improves VRAM estimation accuracy. It also includes a new `REQUIRES` command for Modelfiles to specify Ollama version requirements.
v0.14.0-rc94 fixes5 featuresThis release introduces experimental support for image generation models, adds Anthropic API compatibility, and includes several stability improvements related to VRAM estimation and error handling.
v0.14.0-rc103 fixes5 featuresThis release introduces experimental support for image generation models via MLX and enhances Anthropic API compatibility. It also adds the `REQUIRES` command to Modelfiles for version declaration and improves VRAM estimation accuracy.
v0.14.0-rc114 fixes5 featuresThis release introduces experimental support for image generation models, adds Anthropic API compatibility, and includes several stability improvements related to VRAM estimation and error handling.
v0.13.51 fix3 featuresThis release introduces support for Google's FunctionGemma model and migrates BERT architecture models to the Ollama engine. It also improves tool parsing for DeepSeek-V3.1 and fixes bugs related to nested tool properties.
v0.13.42 fixes3 featuresThis release introduces support for Nemotron 3 Nano and Olmo 3 models, enables Flash Attention by default, and provides critical fixes for Gemma 3 model architectures.
v0.13.31 fix5 featuresThis release introduces support for Devstral-Small-2, rnj-1, and nomic-embed-text-v2 models, while improving embedding truncation logic and fixing image input issues for qwen2.5vl.
v0.13.22 fixes2 featuresThis release introduces support for the Qwen3-Next model series and enables Flash Attention by default for vision models. It also includes critical fixes for multi-GPU CUDA detection and DeepSeek-v3.1 thinking behavior.
v0.13.14 fixes6 featuresThis release introduces support for Ministral-3 and Mistral-Large-3 models, adds tool calling for cogito-v2.1, and includes several fixes for CUDA detection and error reporting.
v0.13.02 fixes7 featuresThis release introduces support for DeepSeek-OCR, Cogito-V2.1, and DeepSeek-V3.1 architecture, alongside a new performance benchmarking tool and significant engine optimizations for KV caching and GPU detection.
v0.12.113 fixes6 featuresOllama 0.12.11 introduces logprobs support for API responses and adds opt-in Vulkan acceleration for expanded GPU compatibility.
v0.12.103 fixes5 featuresThis release enables embedding model support via the CLI, adds tool call IDs to the chat API, and improves Vulkan performance and hardware detection.
v0.12.91 fixThis release addresses a performance regression specifically impacting users running Ollama on CPU-only hardware.
v0.12.84 fixes3 featuresThis release focuses on performance optimizations for qwen3-vl, including default Flash Attention support, and fixes several issues related to model thinking modes and image processing.
v0.12.77 fixes8 featuresThis release introduces support for Qwen3-VL and MiniMax-M2 models, adds file attachments and thinking level adjustments to the app, and provides updated API documentation alongside several embedding and backend bug fixes.
v0.12.65 fixes3 featuresThis release introduces search support for tool-calling models, enables Flash Attention for Gemma 3, and adds experimental Vulkan support for broader GPU compatibility alongside several model-specific bug fixes.
v0.12.5Breaking2 fixes2 featuresThis release introduces structured output support for thinking models and improves app startup behavior, while removing support for older macOS versions and specific AMD GPU architectures.
v0.12.4Breaking5 fixes3 featuresThis release enables Flash Attention by default for Qwen 3 models and improves VRAM detection, while dropping support for older macOS versions and specific AMD GPU architectures.
v0.12.33 fixes3 featuresThis release adds support for DeepSeek-V3.1-Terminus and Kimi-K2-Instruct-0905 models while fixing critical bugs related to tool call parsing, Unicode rendering, and model loading crashes.
v0.12.21 fix4 featuresThis release introduces a new Web Search API for real-time information retrieval and expands the new engine's capabilities to support Qwen3 architectures and multi-regex pretokenizers.
v0.12.14 fixes2 featuresThis release adds support for Qwen3 Embedding and tool calling for Qwen3-Coder, alongside several bug fixes for Gemma3 models, Linux sign-in, and function calling parsing.
v0.12.03 fixes3 featuresThis release introduces cloud models in preview, expanding hardware support for larger models, and adds native support for Bert and Qwen 3 architectures within Ollama's engine.
v0.11.11Breaking5 fixes6 featuresThis release adds CUDA 13 support, introduces a dimensions field for embeddings, and improves memory estimation and app UI. It also removes support for loading split vision models in the Ollama engine.
v0.11.101 featureThis release introduces support for the EmbeddingGemma model, providing a high-performance open embedding model for Ollama users.
v0.11.92 fixes1 featureThis release focuses on performance optimizations through CPU/GPU overlapping and stability fixes for AMD GPUs and Unix-based installations.
v0.11.82 featuresThis release enables flash attention by default for gpt-oss models and improves their overall loading performance.
v0.11.74 fixes4 featuresThis release introduces the DeepSeek-V3.1 model and the preview of Turbo mode for running large models. Several bugs related to model loading, thinking tags, and tool call parsing have also been resolved.
v0.11.62 fixes3 featuresThis release focuses on UI improvements for the Ollama app, including faster chat switching and better layouts, alongside performance optimizations for flash attention and BPE encoding.
v0.11.52 fixes6 featuresThis release introduces significant memory management improvements for GPU scheduling and multi-GPU setups, alongside performance optimizations for gpt-oss models and reduced installation sizes.
v0.11.41 fix2 featuresThis release improves OpenAI API compatibility by supporting simultaneous content and tool calls, ensuring tool name propagation, and consistently providing reasoning in responses.
v0.11.31 fix1 featureThis release fixes a VRAM leak in gpt-oss during multi-device execution and improves Windows stability by statically linking C++ libraries.
v0.11.22 fixesThis patch release focuses on stability improvements for gpt-oss, specifically fixing crashes related to KV cache quantization and a missing variable definition.
v0.11.01 fix8 featuresOllama v0.11 introduces support for OpenAI's gpt-oss models (20B and 120B) featuring native MXFP4 quantization, agentic capabilities, and configurable reasoning effort.
v0.10.12 fixesThis patch release focuses on bug fixes for international character input and log output accuracy.
v0.10.0Breaking3 fixes5 featuresOllama v0.10.0 introduces a new desktop app, significant performance optimizations for gemma3n and multi-GPU setups, and critical fixes for tool calling and API image support.
v0.9.61 fix1 featureThis release introduces the ability to specify tool names in chat messages and includes a UI fix for the launch screen.
v0.9.5Breaking2 fixes4 featuresOllama 0.9.5 introduces a native macOS app with faster startup, network exposure capabilities, and customizable model storage directories. It also raises the minimum macOS requirement to version 12.
v0.9.4Breaking3 fixes3 featuresThis release introduces network exposure and custom model directories, while significantly optimizing the macOS application as a native app with a smaller footprint. It also includes fixes for tool calling and Gemma 3n model quantization.
v0.9.31 fix2 featuresThis release adds support for the multilingual Gemma 3n model family and introduces automatic context length limiting to improve model stability.
v0.9.23 fixesThis patch release focuses on bug fixes for tool calling, generation errors, and tokenization issues across specific model architectures.
v0.9.13 fixes7 featuresThis release introduces tool calling for DeepSeek-R1 and Magistral, alongside a major preview of native macOS and Windows applications featuring network exposure and custom model directories.
v0.9.05 featuresOllama v0.9.0 introduces 'thinking' support, allowing models like DeepSeek R1 and Qwen 3 to separate reasoning from output via a new API field and CLI toggles.
v0.8.02 featuresOllama v0.8.0 introduces streaming support for tool calls and improves engine logging with more detailed memory estimation data.
v0.7.12 fixes4 featuresThis release introduces support for Qwen 3 and Qwen 2 architectures while providing critical stability fixes for multimodal models and memory management.
v0.7.0Breaking6 fixes5 featuresOllama v0.7.0 introduces a new multimodal engine supporting vision models like Llama 4 and Gemma 3, along with WebP support and various performance improvements and bug fixes.
v0.6.83 fixes3 featuresThis release focuses on performance optimizations for Qwen 3 MoE models and critical stability fixes, including memory leak resolutions and improved OOM handling.
v0.6.74 fixes4 featuresOllama v0.6.7 introduces support for Llama 4, Qwen 3, and Phi 4 reasoning models while increasing the default context window and fixing several inference and path-handling bugs.
v0.6.64 fixes7 featuresThis release adds support for IBM Granite 3.3 and DeepCoder models, introduces an experimental high-performance downloader, and fixes critical memory leaks for Gemma 3 and Mistral Small 3.1.
v0.6.52 featuresThis release introduces support for the Mistral Small 3.1 vision model and optimizes loading performance for Gemma 3 on network-backed filesystems.
v0.6.44 fixes2 featuresOllama v0.6.4 focuses on stability improvements for Gemma 3 and DeepSeek models, adds vision capability metadata to the API, and introduces AMD RDNA4 support for Linux users.
v0.6.32 fixes5 featuresThis release introduces performance optimizations and improved loading for Gemma 3, alongside critical bug fixes for model execution errors and enhancements to the CLI tools.
v0.6.22 fixes3 featuresThis release introduces multi-image support and memory optimizations for Gemma 3, adds support for AMD Strix Halo GPUs, and fixes issues with model quantization and saving.
v0.6.12 fixes4 featuresThis release introduces support for the Command A 111B model, improves memory management for gemma3, and adds new CLI features including verbose model information and navigation hotkeys.
v0.6.01 fix1 featureThis release introduces support for Google's Gemma 3 model family across various sizes and resolves execution errors for Snowflake Arctic embedding models.
Common Errors
Related AI & LLMs Packages
AutoGPT is the vision of accessible AI for everyone, to use and to build on. Our mission is to provide the tools, so that you can focus on what matters.
🦜🔗 The platform for reliable agents.
The most powerful and modular diffusion model GUI, api and backend with a graph/nodes interface.
LLM inference in C/C++
GPT4All: Run Local LLMs on Any Device. Open-source and available for commercial use.
A high-throughput and memory-efficient inference and serving engine for LLMs
Subscribe to Updates
Get notified when new versions are released