Ollama
AI & LLMsGet up and running with OpenAI gpt-oss, DeepSeek-R1, Gemma 3 and other models.
Release History
v0.14.0-rc2v0.13.51 fix3 featuresThis release introduces support for Google's FunctionGemma model and migrates BERT architecture models to the Ollama engine. It also improves tool parsing for DeepSeek-V3.1 and fixes bugs related to nested tool properties.
v0.13.42 fixes3 featuresThis release introduces support for Nemotron 3 Nano and Olmo 3 models, enables Flash Attention by default, and provides critical fixes for Gemma 3 model architectures.
v0.13.31 fix5 featuresThis release introduces support for Devstral-Small-2, rnj-1, and nomic-embed-text-v2 models, while improving embedding truncation logic and fixing image input issues for qwen2.5vl.
v0.13.22 fixes2 featuresThis release introduces support for the Qwen3-Next model series and enables Flash Attention by default for vision models. It also includes critical fixes for multi-GPU CUDA detection and DeepSeek-v3.1 thinking behavior.
v0.13.14 fixes6 featuresThis release introduces support for Ministral-3 and Mistral-Large-3 models, adds tool calling for cogito-v2.1, and includes several fixes for CUDA detection and error reporting.
v0.13.02 fixes7 featuresThis release introduces support for DeepSeek-OCR, Cogito-V2.1, and DeepSeek-V3.1 architecture, alongside a new performance benchmarking tool and significant engine optimizations for KV caching and GPU detection.
v0.12.113 fixes6 featuresOllama 0.12.11 introduces logprobs support for API responses and adds opt-in Vulkan acceleration for expanded GPU compatibility.
v0.12.103 fixes5 featuresThis release enables embedding model support via the CLI, adds tool call IDs to the chat API, and improves Vulkan performance and hardware detection.
v0.12.91 fixThis release addresses a performance regression specifically impacting users running Ollama on CPU-only hardware.
v0.12.84 fixes3 featuresThis release focuses on performance optimizations for qwen3-vl, including default Flash Attention support, and fixes several issues related to model thinking modes and image processing.
v0.12.77 fixes8 featuresThis release introduces support for Qwen3-VL and MiniMax-M2 models, adds file attachments and thinking level adjustments to the app, and provides updated API documentation alongside several embedding and backend bug fixes.
v0.12.65 fixes3 featuresThis release introduces search support for tool-calling models, enables Flash Attention for Gemma 3, and adds experimental Vulkan support for broader GPU compatibility alongside several model-specific bug fixes.
v0.12.5Breaking2 fixes2 featuresThis release introduces structured output support for thinking models and improves app startup behavior, while removing support for older macOS versions and specific AMD GPU architectures.
v0.12.4Breaking5 fixes3 featuresThis release enables Flash Attention by default for Qwen 3 models and improves VRAM detection, while dropping support for older macOS versions and specific AMD GPU architectures.
v0.12.33 fixes3 featuresThis release adds support for DeepSeek-V3.1-Terminus and Kimi-K2-Instruct-0905 models while fixing critical bugs related to tool call parsing, Unicode rendering, and model loading crashes.
v0.12.21 fix4 featuresThis release introduces a new Web Search API for real-time information retrieval and expands the new engine's capabilities to support Qwen3 architectures and multi-regex pretokenizers.
v0.12.14 fixes2 featuresThis release adds support for Qwen3 Embedding and tool calling for Qwen3-Coder, alongside several bug fixes for Gemma3 models, Linux sign-in, and function calling parsing.
v0.12.03 fixes3 featuresThis release introduces cloud models in preview, expanding hardware support for larger models, and adds native support for Bert and Qwen 3 architectures within Ollama's engine.
v0.11.11Breaking5 fixes6 featuresThis release adds CUDA 13 support, introduces a dimensions field for embeddings, and improves memory estimation and app UI. It also removes support for loading split vision models in the Ollama engine.
v0.11.101 featureThis release introduces support for the EmbeddingGemma model, providing a high-performance open embedding model for Ollama users.
v0.11.92 fixes1 featureThis release focuses on performance optimizations through CPU/GPU overlapping and stability fixes for AMD GPUs and Unix-based installations.
v0.11.82 featuresThis release enables flash attention by default for gpt-oss models and improves their overall loading performance.
v0.11.74 fixes4 featuresThis release introduces the DeepSeek-V3.1 model and the preview of Turbo mode for running large models. Several bugs related to model loading, thinking tags, and tool call parsing have also been resolved.
v0.11.62 fixes3 featuresThis release focuses on UI improvements for the Ollama app, including faster chat switching and better layouts, alongside performance optimizations for flash attention and BPE encoding.
v0.11.52 fixes6 featuresThis release introduces significant memory management improvements for GPU scheduling and multi-GPU setups, alongside performance optimizations for gpt-oss models and reduced installation sizes.
v0.11.41 fix2 featuresThis release improves OpenAI API compatibility by supporting simultaneous content and tool calls, ensuring tool name propagation, and consistently providing reasoning in responses.
v0.11.31 fix1 featureThis release fixes a VRAM leak in gpt-oss during multi-device execution and improves Windows stability by statically linking C++ libraries.
v0.11.22 fixesThis patch release focuses on stability improvements for gpt-oss, specifically fixing crashes related to KV cache quantization and a missing variable definition.
v0.11.01 fix8 featuresOllama v0.11 introduces support for OpenAI's gpt-oss models (20B and 120B) featuring native MXFP4 quantization, agentic capabilities, and configurable reasoning effort.
v0.10.12 fixesThis patch release focuses on bug fixes for international character input and log output accuracy.
v0.10.0Breaking3 fixes5 featuresOllama v0.10.0 introduces a new desktop app, significant performance optimizations for gemma3n and multi-GPU setups, and critical fixes for tool calling and API image support.
v0.9.61 fix1 featureThis release introduces the ability to specify tool names in chat messages and includes a UI fix for the launch screen.
v0.9.5Breaking2 fixes4 featuresOllama 0.9.5 introduces a native macOS app with faster startup, network exposure capabilities, and customizable model storage directories. It also raises the minimum macOS requirement to version 12.
v0.9.4Breaking3 fixes3 featuresThis release introduces network exposure and custom model directories, while significantly optimizing the macOS application as a native app with a smaller footprint. It also includes fixes for tool calling and Gemma 3n model quantization.
v0.9.31 fix2 featuresThis release adds support for the multilingual Gemma 3n model family and introduces automatic context length limiting to improve model stability.
v0.9.23 fixesThis patch release focuses on bug fixes for tool calling, generation errors, and tokenization issues across specific model architectures.
v0.9.13 fixes7 featuresThis release introduces tool calling for DeepSeek-R1 and Magistral, alongside a major preview of native macOS and Windows applications featuring network exposure and custom model directories.
v0.9.05 featuresOllama v0.9.0 introduces 'thinking' support, allowing models like DeepSeek R1 and Qwen 3 to separate reasoning from output via a new API field and CLI toggles.
v0.8.02 featuresOllama v0.8.0 introduces streaming support for tool calls and improves engine logging with more detailed memory estimation data.
v0.7.12 fixes4 featuresThis release introduces support for Qwen 3 and Qwen 2 architectures while providing critical stability fixes for multimodal models and memory management.
v0.7.0Breaking6 fixes5 featuresOllama v0.7.0 introduces a new multimodal engine supporting vision models like Llama 4 and Gemma 3, along with WebP support and various performance improvements and bug fixes.
v0.6.83 fixes3 featuresThis release focuses on performance optimizations for Qwen 3 MoE models and critical stability fixes, including memory leak resolutions and improved OOM handling.
v0.6.74 fixes4 featuresOllama v0.6.7 introduces support for Llama 4, Qwen 3, and Phi 4 reasoning models while increasing the default context window and fixing several inference and path-handling bugs.
v0.6.64 fixes7 featuresThis release adds support for IBM Granite 3.3 and DeepCoder models, introduces an experimental high-performance downloader, and fixes critical memory leaks for Gemma 3 and Mistral Small 3.1.
v0.6.52 featuresThis release introduces support for the Mistral Small 3.1 vision model and optimizes loading performance for Gemma 3 on network-backed filesystems.
v0.6.44 fixes2 featuresOllama v0.6.4 focuses on stability improvements for Gemma 3 and DeepSeek models, adds vision capability metadata to the API, and introduces AMD RDNA4 support for Linux users.
v0.6.32 fixes5 featuresThis release introduces performance optimizations and improved loading for Gemma 3, alongside critical bug fixes for model execution errors and enhancements to the CLI tools.
v0.6.22 fixes3 featuresThis release introduces multi-image support and memory optimizations for Gemma 3, adds support for AMD Strix Halo GPUs, and fixes issues with model quantization and saving.
v0.6.12 fixes4 featuresThis release introduces support for the Command A 111B model, improves memory management for gemma3, and adds new CLI features including verbose model information and navigation hotkeys.
v0.6.01 fix1 featureThis release introduces support for Google's Gemma 3 model family across various sizes and resolves execution errors for Snowflake Arctic embedding models.
v0.5.132 fixes6 featuresThis release adds support for Phi-4-Mini, Granite-3.2-Vision, and Command R7B Arabic models, introduces a global context length environment variable, and adds NVIDIA Blackwell compatibility.
v0.5.125 fixes3 featuresThis release introduces the Perplexity R1 1776 model and improves the OpenAI-compatible API with tool calling support. It also includes several Linux-specific bug fixes and performance restorations for Intel Xeon processors.
v0.5.112 fixesOllama v0.5.11 is a patch release focusing on bug fixes for Windows path errors and Intel Mac CPU acceleration.
v0.5.101 fixThis release focuses on a bug fix for multi-GPU memory estimation on Windows and Linux systems.
v0.5.91 fix2 featuresThis release introduces support for DeepScaleR and OpenThinker reasoning models and resolves a critical llama runner termination bug on Windows.
v0.5.8Breaking2 fixes4 featuresThis release introduces significant CPU and GPU acceleration optimizations, including AVX-512 support and improved compatibility for non-AVX systems. It also updates the macOS distribution format and fixes critical model download bugs.
v0.5.71 fix1 featureThis release adds support for importing Command R/R+ architectures from safetensors and fixes a bug involving multiple FROM commands in Modelfiles.
v0.5.62 fixesThis patch release addresses issues with the 'ollama create' command, specifically fixing errors related to Windows environments and absolute path handling.