v0.31.1
📦 ollamaView on GitHub →
✨ 1 features🔧 2 symbols
Summary
This release introduces significant performance improvements for Gemma 4 on Apple Silicon by leveraging multi-token prediction (MTP). It also includes updates to the underlying MLX and llama.cpp engines.
✨ New Features
- Gemma 4 token generation is significantly faster (up to 90% improvement) on Apple Silicon using multi-token prediction (MTP), which is enabled by default and requires no configuration.