v0.13.2
📦 ollamaView on GitHub →
✨ 2 features🐛 2 fixes🔧 5 symbols
Summary
This release introduces support for the Qwen3-Next model series and enables Flash Attention by default for vision models. It also includes critical fixes for multi-GPU CUDA detection and DeepSeek-v3.1 thinking behavior.
✨ New Features
- Added support for Qwen3-Next model series.
- Flash attention is now enabled by default for vision models (mistral-3, gemma3, qwen3-vl, etc.) to improve memory utilization and performance.
🐛 Bug Fixes
- Fixed GPU detection on multi-GPU CUDA machines.
- Fixed issue where deepseek-v3.1 would continue thinking even when the feature was disabled in the app.
🔧 Affected Symbols
mistral-3gemma3qwen3-vldeepseek-v3.1CUDA GPU detection