v0.31.1

📅 Jun 30, 2026📦 ollamaView on GitHub →

✨ 1 features🔧 2 symbols

Summary

This release introduces significant performance improvements for Gemma 4 on Apple Silicon by leveraging multi-token prediction (MTP). It also includes updates to the underlying MLX and llama.cpp engines.

✨ New Features

Gemma 4 token generation is significantly faster (up to 90% improvement) on Apple Silicon using multi-token prediction (MTP), which is enabled by default and requires no configuration.

Affected Symbols

MLX engine llama.cpp engine