v0.17.5

📅 Mar 2, 2026📦 ollamaView on GitHub →

✨ 1 features🐛 4 fixes🔧 3 symbols

Summary

This release focuses on stability and performance improvements for Qwen 3.5 models, particularly when running across multiple devices or using the MLX engine, and introduces peak memory reporting.

Migration Steps

If experiencing repetition issues with Qwen 3.5 models, redownload them using a command like: `ollama pull qwen3.5:35b`.

✨ New Features

Added peak memory usage display to `ollama run --verbose` when using the MLX engine.

🐛 Bug Fixes

Fixed crash when Qwen 3.5 models were split across GPU and CPU.
Resolved issue where Qwen 3.5 models repeated output due to missing presence penalty.
Fixed memory issues and crashes within the MLX runner.
Resolved inability to run models imported from Qwen3.5 GGUF files.

Affected Symbols

Qwen 3.5 models MLX engine Qwen3.5 GGUF files