Change8

v0.17.5

📦 ollamaView on GitHub →
1 features🐛 4 fixes🔧 3 symbols

Summary

This release focuses on stability and performance improvements for Qwen 3.5 models, particularly when running across multiple devices or using the MLX engine, and introduces peak memory reporting.

Migration Steps

  1. If experiencing repetition issues with Qwen 3.5 models, redownload them using a command like: `ollama pull qwen3.5:35b`.

✨ New Features

  • Added peak memory usage display to `ollama run --verbose` when using the MLX engine.

🐛 Bug Fixes

  • Fixed crash when Qwen 3.5 models were split across GPU and CPU.
  • Resolved issue where Qwen 3.5 models repeated output due to missing presence penalty.
  • Fixed memory issues and crashes within the MLX runner.
  • Resolved inability to run models imported from Qwen3.5 GGUF files.

Affected Symbols