Change8

v0.31.1

📦 ollamaView on GitHub →
1 features🔧 2 symbols

Summary

This release introduces significant performance improvements for Gemma 4 on Apple Silicon by leveraging multi-token prediction (MTP). It also includes updates to the underlying MLX and llama.cpp engines.

✨ New Features

  • Gemma 4 token generation is significantly faster (up to 90% improvement) on Apple Silicon using multi-token prediction (MTP), which is enabled by default and requires no configuration.

Affected Symbols