v0.12.6

📅 Oct 15, 2025📦 ollama

✨ 3 features🐛 5 fixes🔧 9 symbols

Summary

This release introduces search support for tool-calling models, enables Flash Attention for Gemma 3, and adds experimental Vulkan support for broader GPU compatibility alongside several model-specific bug fixes.

Migration Steps

To use experimental Vulkan support, install the Vulkan SDK, set the VULKAN_SDK environment variable, and build from source.

✨ New Features

Search support for DeepSeek-V3.1, Qwen3, and other tool-calling models.
Flash attention enabled by default for Gemma 3 for better performance and memory utilization.
Experimental Vulkan support for AMD and Intel GPUs (requires building from source).

🐛 Bug Fixes

Fixed issue where Ollama would hang during response generation.
Fixed qwen3-coder acting in raw mode when using /api/generate or ollama run.
Fixed invalid results provided by qwen3-embedding.
Fixed model eviction logic when num_gpu is set.
Fixed issue where tool_index with a value of 0 was not sent to the model.

🔧 Affected Symbols

Gemma 3qwen3-coderqwen3-embedding/api/generateollama runnum_gputool_indexDeepSeek-V3.1Qwen3