v0.18.2-rc1
📦 ollamaView on GitHub →
✨ 3 features🐛 1 fixes🔧 4 symbols
Summary
This release introduces significant performance and feature enhancements for MLX backend, including model eviction, quantized embeddings, and fast SwiGLU. It also includes a fix for the web_search legacy path in the cloud proxy.
✨ New Features
- Model eviction implemented for MLX.
- Added prequantized tensor packing and related changes for qwen35 support in MLX.
- Implemented quantized embeddings and fast SwiGLU, along with runtime fixes for MLX.
🐛 Bug Fixes
- Cloud proxy now flushes on newlines for the web_search legacy path.