v0.12.8
📦 ollama
✨ 3 features🐛 4 fixes🔧 4 symbols
Summary
This release focuses on performance optimizations for qwen3-vl, including default Flash Attention support, and fixes several issues related to model thinking modes and image processing.
✨ New Features
- Enabled Flash Attention by default for qwen3-vl for improved performance.
- Ollama now automatically stops a running model before it is removed via the 'ollama rm' command.
- Added logic to ignore unsupported iGPUs during device discovery on Windows.
🐛 Bug Fixes
- Reduced leading whitespace in qwen3-vl responses when using thinking mode.
- Fixed an issue where deepseek-v3.1 thinking could not be disabled in the new Ollama app.
- Fixed image interpretation failure for qwen3-vl when processing images with transparent backgrounds.
- Resolved a performance regression where prompt processing was slower on the Ollama engine.
🔧 Affected Symbols
qwen3-vldeepseek-v3.1ollama rmFlash Attention