Change8

v0.18.4-rc0

📦 ollamaView on GitHub →
🐛 2 fixes🔧 3 symbols

Summary

This release focuses on stability improvements, including fixing a memory leak in mlx and adjusting settings for the Grok model on ggml. It also updates VS Code documentation and hides the VS Code launch option.

Migration Steps

  1. If you are using Grok models, note that flash attention is now forced off for ggml backend.

🐛 Bug Fixes

  • Fixed a KV cache snapshot memory leak in mlx.
  • Scheduled periodic snapshots during prefill in mlxrunner.

Affected Symbols