v0.18.4-rc0
📦 ollamaView on GitHub →
🐛 2 fixes🔧 3 symbols
Summary
This release focuses on stability improvements, including fixing a memory leak in mlx and adjusting settings for the Grok model on ggml. It also updates VS Code documentation and hides the VS Code launch option.
Migration Steps
- If you are using Grok models, note that flash attention is now forced off for ggml backend.
🐛 Bug Fixes
- Fixed a KV cache snapshot memory leak in mlx.
- Scheduled periodic snapshots during prefill in mlxrunner.