v0.12.7

📅 Oct 29, 2025📦 ollamaView on GitHub →

✨ 8 features🐛 7 fixes🔧 8 symbols

Summary

This release introduces support for Qwen3-VL and MiniMax-M2 models, adds file attachments and thinking level adjustments to the app, and provides updated API documentation alongside several embedding and backend bug fixes.

✨ New Features

Support for Qwen3-VL models (2B to 235B parameters).
Support for MiniMax-M2 230B parameter model.
Added ability to attach one or many files when prompting in the Ollama app.
Added adjustable thinking levels for gpt-oss models in the app.
The OpenAI-compatible /v1/embeddings endpoint now supports the encoding_format parameter.
Improved tool call parsing to handle non-conforming JSON structures.
Model load failures on Windows now include more detailed error information.
Increased model scheduling speed.

🐛 Bug Fixes

Fixed incorrect embedding results when running embeddinggemma.
Fixed gemma3n support on the Vulkan backend.
Fixed truncation errors during embedding generation.
Fixed request status codes when running cloud models.
Fixed prompt processing reporting in the llama runner.
Fixed issue where FROM <model> would not inherit RENDERER or PARSER commands.
Increased timeout for ROCm device discovery.

🔧 Affected Symbols

/v1/embeddingsembeddinggemmagemma3nRENDERERPARSERFROMROCmVulkan