v0.18.2-rc0

📅 Mar 18, 2026📦 ollamaView on GitHub →

✨ 3 features🐛 1 fixes

Summary

This release introduces significant performance and feature enhancements for the MLX backend, including model eviction, quantized embeddings, and fast SwiGLU. It also includes a fix for the web_search legacy path in the cloud proxy.

Migration Steps

✨ New Features

Model eviction implemented for MLX backend.
Added prequantized tensor packing and related changes for qwen35 support in MLX.
Implemented quantized embeddings and fast SwiGLU activation function for MLX, along with runtime fixes.

🐛 Bug Fixes

Cloud proxy now flushes on newlines for the web_search legacy path.