v0.12.6
📦 ollama
✨ 3 features🐛 5 fixes🔧 9 symbols
Summary
This release introduces search support for tool-calling models, enables Flash Attention for Gemma 3, and adds experimental Vulkan support for broader GPU compatibility alongside several model-specific bug fixes.
Migration Steps
- To use experimental Vulkan support, install the Vulkan SDK, set the VULKAN_SDK environment variable, and build from source.
✨ New Features
- Search support for DeepSeek-V3.1, Qwen3, and other tool-calling models.
- Flash attention enabled by default for Gemma 3 for better performance and memory utilization.
- Experimental Vulkan support for AMD and Intel GPUs (requires building from source).
🐛 Bug Fixes
- Fixed issue where Ollama would hang during response generation.
- Fixed qwen3-coder acting in raw mode when using /api/generate or ollama run.
- Fixed invalid results provided by qwen3-embedding.
- Fixed model eviction logic when num_gpu is set.
- Fixed issue where tool_index with a value of 0 was not sent to the model.
🔧 Affected Symbols
Gemma 3qwen3-coderqwen3-embedding/api/generateollama runnum_gputool_indexDeepSeek-V3.1Qwen3