v0.12.7
📦 ollamaView on GitHub →
✨ 8 features🐛 7 fixes🔧 8 symbols
Summary
This release introduces support for Qwen3-VL and MiniMax-M2 models, adds file attachments and thinking level adjustments to the app, and provides updated API documentation alongside several embedding and backend bug fixes.
✨ New Features
- Support for Qwen3-VL models (2B to 235B parameters).
- Support for MiniMax-M2 230B parameter model.
- Added ability to attach one or many files when prompting in the Ollama app.
- Added adjustable thinking levels for gpt-oss models in the app.
- The OpenAI-compatible /v1/embeddings endpoint now supports the encoding_format parameter.
- Improved tool call parsing to handle non-conforming JSON structures.
- Model load failures on Windows now include more detailed error information.
- Increased model scheduling speed.
🐛 Bug Fixes
- Fixed incorrect embedding results when running embeddinggemma.
- Fixed gemma3n support on the Vulkan backend.
- Fixed truncation errors during embedding generation.
- Fixed request status codes when running cloud models.
- Fixed prompt processing reporting in the llama runner.
- Fixed issue where FROM <model> would not inherit RENDERER or PARSER commands.
- Increased timeout for ROCm device discovery.
🔧 Affected Symbols
/v1/embeddingsembeddinggemmagemma3nRENDERERPARSERFROMROCmVulkan