Change8

Ollama

AI & LLMs

Get up and running with OpenAI gpt-oss, DeepSeek-R1, Gemma 3 and other models.

Latest: v0.14.0-rc259 releases8 breaking changesView on GitHub →

Release History

v0.14.0-rc2
Jan 10, 2026
v0.13.51 fix3 features
Dec 18, 2025

This release introduces support for Google's FunctionGemma model and migrates BERT architecture models to the Ollama engine. It also improves tool parsing for DeepSeek-V3.1 and fixes bugs related to nested tool properties.

v0.13.42 fixes3 features
Dec 13, 2025

This release introduces support for Nemotron 3 Nano and Olmo 3 models, enables Flash Attention by default, and provides critical fixes for Gemma 3 model architectures.

v0.13.31 fix5 features
Dec 9, 2025

This release introduces support for Devstral-Small-2, rnj-1, and nomic-embed-text-v2 models, while improving embedding truncation logic and fixing image input issues for qwen2.5vl.

v0.13.22 fixes2 features
Dec 4, 2025

This release introduces support for the Qwen3-Next model series and enables Flash Attention by default for vision models. It also includes critical fixes for multi-GPU CUDA detection and DeepSeek-v3.1 thinking behavior.

v0.13.14 fixes6 features
Nov 27, 2025

This release introduces support for Ministral-3 and Mistral-Large-3 models, adds tool calling for cogito-v2.1, and includes several fixes for CUDA detection and error reporting.

v0.13.02 fixes7 features
Nov 19, 2025

This release introduces support for DeepSeek-OCR, Cogito-V2.1, and DeepSeek-V3.1 architecture, alongside a new performance benchmarking tool and significant engine optimizations for KV caching and GPU detection.

v0.12.113 fixes6 features
Nov 12, 2025

Ollama 0.12.11 introduces logprobs support for API responses and adds opt-in Vulkan acceleration for expanded GPU compatibility.

v0.12.103 fixes5 features
Nov 5, 2025

This release enables embedding model support via the CLI, adds tool call IDs to the chat API, and improves Vulkan performance and hardware detection.

v0.12.91 fix
Oct 31, 2025

This release addresses a performance regression specifically impacting users running Ollama on CPU-only hardware.

v0.12.84 fixes3 features
Oct 30, 2025

This release focuses on performance optimizations for qwen3-vl, including default Flash Attention support, and fixes several issues related to model thinking modes and image processing.

v0.12.77 fixes8 features
Oct 29, 2025

This release introduces support for Qwen3-VL and MiniMax-M2 models, adds file attachments and thinking level adjustments to the app, and provides updated API documentation alongside several embedding and backend bug fixes.

v0.12.65 fixes3 features
Oct 15, 2025

This release introduces search support for tool-calling models, enables Flash Attention for Gemma 3, and adds experimental Vulkan support for broader GPU compatibility alongside several model-specific bug fixes.

v0.12.5Breaking2 fixes2 features
Oct 10, 2025

This release introduces structured output support for thinking models and improves app startup behavior, while removing support for older macOS versions and specific AMD GPU architectures.

v0.12.4Breaking5 fixes3 features
Oct 3, 2025

This release enables Flash Attention by default for Qwen 3 models and improves VRAM detection, while dropping support for older macOS versions and specific AMD GPU architectures.

v0.12.33 fixes3 features
Sep 26, 2025

This release adds support for DeepSeek-V3.1-Terminus and Kimi-K2-Instruct-0905 models while fixing critical bugs related to tool call parsing, Unicode rendering, and model loading crashes.

v0.12.21 fix4 features
Sep 24, 2025

This release introduces a new Web Search API for real-time information retrieval and expands the new engine's capabilities to support Qwen3 architectures and multi-regex pretokenizers.

v0.12.14 fixes2 features
Sep 21, 2025

This release adds support for Qwen3 Embedding and tool calling for Qwen3-Coder, alongside several bug fixes for Gemma3 models, Linux sign-in, and function calling parsing.

v0.12.03 fixes3 features
Sep 18, 2025

This release introduces cloud models in preview, expanding hardware support for larger models, and adds native support for Bert and Qwen 3 architectures within Ollama's engine.

v0.11.11Breaking5 fixes6 features
Sep 11, 2025

This release adds CUDA 13 support, introduces a dimensions field for embeddings, and improves memory estimation and app UI. It also removes support for loading split vision models in the Ollama engine.

v0.11.101 feature
Sep 4, 2025

This release introduces support for the EmbeddingGemma model, providing a high-performance open embedding model for Ollama users.

v0.11.92 fixes1 feature
Sep 2, 2025

This release focuses on performance optimizations through CPU/GPU overlapping and stability fixes for AMD GPUs and Unix-based installations.

v0.11.82 features
Aug 27, 2025

This release enables flash attention by default for gpt-oss models and improves their overall loading performance.

v0.11.74 fixes4 features
Aug 25, 2025

This release introduces the DeepSeek-V3.1 model and the preview of Turbo mode for running large models. Several bugs related to model loading, thinking tags, and tool call parsing have also been resolved.

v0.11.62 fixes3 features
Aug 20, 2025

This release focuses on UI improvements for the Ollama app, including faster chat switching and better layouts, alongside performance optimizations for flash attention and BPE encoding.

v0.11.52 fixes6 features
Aug 15, 2025

This release introduces significant memory management improvements for GPU scheduling and multi-GPU setups, alongside performance optimizations for gpt-oss models and reduced installation sizes.

v0.11.41 fix2 features
Aug 7, 2025

This release improves OpenAI API compatibility by supporting simultaneous content and tool calls, ensuring tool name propagation, and consistently providing reasoning in responses.

v0.11.31 fix1 feature
Aug 6, 2025

This release fixes a VRAM leak in gpt-oss during multi-device execution and improves Windows stability by statically linking C++ libraries.

v0.11.22 fixes
Aug 5, 2025

This patch release focuses on stability improvements for gpt-oss, specifically fixing crashes related to KV cache quantization and a missing variable definition.

v0.11.01 fix8 features
Aug 5, 2025

Ollama v0.11 introduces support for OpenAI's gpt-oss models (20B and 120B) featuring native MXFP4 quantization, agentic capabilities, and configurable reasoning effort.

v0.10.12 fixes
Jul 31, 2025

This patch release focuses on bug fixes for international character input and log output accuracy.

v0.10.0Breaking3 fixes5 features
Jul 18, 2025

Ollama v0.10.0 introduces a new desktop app, significant performance optimizations for gemma3n and multi-GPU setups, and critical fixes for tool calling and API image support.

v0.9.61 fix1 feature
Jul 8, 2025

This release introduces the ability to specify tool names in chat messages and includes a UI fix for the launch screen.

v0.9.5Breaking2 fixes4 features
Jul 2, 2025

Ollama 0.9.5 introduces a native macOS app with faster startup, network exposure capabilities, and customizable model storage directories. It also raises the minimum macOS requirement to version 12.

v0.9.4Breaking3 fixes3 features
Jun 27, 2025

This release introduces network exposure and custom model directories, while significantly optimizing the macOS application as a native app with a smaller footprint. It also includes fixes for tool calling and Gemma 3n model quantization.

v0.9.31 fix2 features
Jun 25, 2025

This release adds support for the multilingual Gemma 3n model family and introduces automatic context length limiting to improve model stability.

v0.9.23 fixes
Jun 18, 2025

This patch release focuses on bug fixes for tool calling, generation errors, and tokenization issues across specific model architectures.

v0.9.13 fixes7 features
Jun 9, 2025

This release introduces tool calling for DeepSeek-R1 and Magistral, alongside a major preview of native macOS and Windows applications featuring network exposure and custom model directories.

v0.9.05 features
May 29, 2025

Ollama v0.9.0 introduces 'thinking' support, allowing models like DeepSeek R1 and Qwen 3 to separate reasoning from output via a new API field and CLI toggles.

v0.8.02 features
May 27, 2025

Ollama v0.8.0 introduces streaming support for tool calls and improves engine logging with more detailed memory estimation data.

v0.7.12 fixes4 features
May 21, 2025

This release introduces support for Qwen 3 and Qwen 2 architectures while providing critical stability fixes for multimodal models and memory management.

v0.7.0Breaking6 fixes5 features
May 13, 2025

Ollama v0.7.0 introduces a new multimodal engine supporting vision models like Llama 4 and Gemma 3, along with WebP support and various performance improvements and bug fixes.

v0.6.83 fixes3 features
May 3, 2025

This release focuses on performance optimizations for Qwen 3 MoE models and critical stability fixes, including memory leak resolutions and improved OOM handling.

v0.6.74 fixes4 features
Apr 26, 2025

Ollama v0.6.7 introduces support for Llama 4, Qwen 3, and Phi 4 reasoning models while increasing the default context window and fixing several inference and path-handling bugs.

v0.6.64 fixes7 features
Apr 17, 2025

This release adds support for IBM Granite 3.3 and DeepCoder models, introduces an experimental high-performance downloader, and fixes critical memory leaks for Gemma 3 and Mistral Small 3.1.

v0.6.52 features
Apr 6, 2025

This release introduces support for the Mistral Small 3.1 vision model and optimizes loading performance for Gemma 3 on network-backed filesystems.

v0.6.44 fixes2 features
Apr 2, 2025

Ollama v0.6.4 focuses on stability improvements for Gemma 3 and DeepSeek models, adds vision capability metadata to the API, and introduces AMD RDNA4 support for Linux users.

v0.6.32 fixes5 features
Mar 22, 2025

This release introduces performance optimizations and improved loading for Gemma 3, alongside critical bug fixes for model execution errors and enhancements to the CLI tools.

v0.6.22 fixes3 features
Mar 18, 2025

This release introduces multi-image support and memory optimizations for Gemma 3, adds support for AMD Strix Halo GPUs, and fixes issues with model quantization and saving.

v0.6.12 fixes4 features
Mar 14, 2025

This release introduces support for the Command A 111B model, improves memory management for gemma3, and adds new CLI features including verbose model information and navigation hotkeys.

v0.6.01 fix1 feature
Mar 11, 2025

This release introduces support for Google's Gemma 3 model family across various sizes and resolves execution errors for Snowflake Arctic embedding models.

v0.5.132 fixes6 features
Feb 27, 2025

This release adds support for Phi-4-Mini, Granite-3.2-Vision, and Command R7B Arabic models, introduces a global context length environment variable, and adds NVIDIA Blackwell compatibility.

v0.5.125 fixes3 features
Feb 20, 2025

This release introduces the Perplexity R1 1776 model and improves the OpenAI-compatible API with tool calling support. It also includes several Linux-specific bug fixes and performance restorations for Intel Xeon processors.

v0.5.112 fixes
Feb 14, 2025

Ollama v0.5.11 is a patch release focusing on bug fixes for Windows path errors and Intel Mac CPU acceleration.

v0.5.101 fix
Feb 14, 2025

This release focuses on a bug fix for multi-GPU memory estimation on Windows and Linux systems.

v0.5.91 fix2 features
Feb 12, 2025

This release introduces support for DeepScaleR and OpenThinker reasoning models and resolves a critical llama runner termination bug on Windows.

v0.5.8Breaking2 fixes4 features
Feb 5, 2025

This release introduces significant CPU and GPU acceleration optimizations, including AVX-512 support and improved compatibility for non-AVX systems. It also updates the macOS distribution format and fixes critical model download bugs.

v0.5.71 fix1 feature
Jan 16, 2025

This release adds support for importing Command R/R+ architectures from safetensors and fixes a bug involving multiple FROM commands in Modelfiles.

v0.5.62 fixes
Jan 15, 2025

This patch release addresses issues with the 'ollama create' command, specifically fixing errors related to Windows environments and absolute path handling.