Change8

v0.12.7

📦 ollamaView on GitHub →
8 features🐛 7 fixes🔧 8 symbols

Summary

This release introduces support for Qwen3-VL and MiniMax-M2 models, adds file attachments and thinking level adjustments to the app, and provides updated API documentation alongside several embedding and backend bug fixes.

✨ New Features

  • Support for Qwen3-VL models (2B to 235B parameters).
  • Support for MiniMax-M2 230B parameter model.
  • Added ability to attach one or many files when prompting in the Ollama app.
  • Added adjustable thinking levels for gpt-oss models in the app.
  • The OpenAI-compatible /v1/embeddings endpoint now supports the encoding_format parameter.
  • Improved tool call parsing to handle non-conforming JSON structures.
  • Model load failures on Windows now include more detailed error information.
  • Increased model scheduling speed.

🐛 Bug Fixes

  • Fixed incorrect embedding results when running embeddinggemma.
  • Fixed gemma3n support on the Vulkan backend.
  • Fixed truncation errors during embedding generation.
  • Fixed request status codes when running cloud models.
  • Fixed prompt processing reporting in the llama runner.
  • Fixed issue where FROM <model> would not inherit RENDERER or PARSER commands.
  • Increased timeout for ROCm device discovery.

🔧 Affected Symbols

/v1/embeddingsembeddinggemmagemma3nRENDERERPARSERFROMROCmVulkan