Change8

v0.12.8

📦 ollama
3 features🐛 4 fixes🔧 4 symbols

Summary

This release focuses on performance optimizations for qwen3-vl, including default Flash Attention support, and fixes several issues related to model thinking modes and image processing.

✨ New Features

  • Enabled Flash Attention by default for qwen3-vl for improved performance.
  • Ollama now automatically stops a running model before it is removed via the 'ollama rm' command.
  • Added logic to ignore unsupported iGPUs during device discovery on Windows.

🐛 Bug Fixes

  • Reduced leading whitespace in qwen3-vl responses when using thinking mode.
  • Fixed an issue where deepseek-v3.1 thinking could not be disabled in the new Ollama app.
  • Fixed image interpretation failure for qwen3-vl when processing images with transparent backgrounds.
  • Resolved a performance regression where prompt processing was slower on the Ollama engine.

🔧 Affected Symbols

qwen3-vldeepseek-v3.1ollama rmFlash Attention