v0.12.8

📅 Oct 30, 2025📦 ollama

✨ 3 features🐛 4 fixes🔧 4 symbols

Summary

This release focuses on performance optimizations for qwen3-vl, including default Flash Attention support, and fixes several issues related to model thinking modes and image processing.

✨ New Features

Enabled Flash Attention by default for qwen3-vl for improved performance.
Ollama now automatically stops a running model before it is removed via the 'ollama rm' command.
Added logic to ignore unsupported iGPUs during device discovery on Windows.

🐛 Bug Fixes

Reduced leading whitespace in qwen3-vl responses when using thinking mode.
Fixed an issue where deepseek-v3.1 thinking could not be disabled in the new Ollama app.
Fixed image interpretation failure for qwen3-vl when processing images with transparent backgrounds.
Resolved a performance regression where prompt processing was slower on the Ollama engine.

🔧 Affected Symbols

qwen3-vldeepseek-v3.1ollama rmFlash Attention