Change8

v0.12.6

📦 ollama
3 features🐛 5 fixes🔧 9 symbols

Summary

This release introduces search support for tool-calling models, enables Flash Attention for Gemma 3, and adds experimental Vulkan support for broader GPU compatibility alongside several model-specific bug fixes.

Migration Steps

  1. To use experimental Vulkan support, install the Vulkan SDK, set the VULKAN_SDK environment variable, and build from source.

✨ New Features

  • Search support for DeepSeek-V3.1, Qwen3, and other tool-calling models.
  • Flash attention enabled by default for Gemma 3 for better performance and memory utilization.
  • Experimental Vulkan support for AMD and Intel GPUs (requires building from source).

🐛 Bug Fixes

  • Fixed issue where Ollama would hang during response generation.
  • Fixed qwen3-coder acting in raw mode when using /api/generate or ollama run.
  • Fixed invalid results provided by qwen3-embedding.
  • Fixed model eviction logic when num_gpu is set.
  • Fixed issue where tool_index with a value of 0 was not sent to the model.

🔧 Affected Symbols

Gemma 3qwen3-coderqwen3-embedding/api/generateollama runnum_gputool_indexDeepSeek-V3.1Qwen3