v0.11.11
Breaking Changes📦 ollama
⚠ 1 breaking✨ 6 features🐛 5 fixes🔧 6 symbols
Summary
This release adds CUDA 13 support, introduces a dimensions field for embeddings, and improves memory estimation and app UI. It also removes support for loading split vision models in the Ollama engine.
⚠️ Breaking Changes
- Split vision models are no longer supported in the Ollama engine.
Migration Steps
- If using split vision models, transition to supported model formats as they will no longer load in the Ollama engine.
✨ New Features
- Added support for CUDA 13.
- Added 'dimensions' field to embedding requests.
- Added zoom and shrink functionality (Cmd +/-) in the Ollama app.
- Added ability to copy assistant messages in the Ollama app.
- Enabled new memory estimates in the Ollama engine by default.
- Improved scrolling performance in the Ollama app for long prompts.
🐛 Bug Fixes
- Fixed error when importing safetensor files.
- Fixed error occurring when batch size exceeded context length.
- Fixed validation issues for Flash Attention and KV cache quantization.
- Improved memory usage for gpt-oss in the Ollama app.
- Improved memory estimates for hybrid and recurrent models.
🔧 Affected Symbols
embeddimensionssafetensorFlashAttentionKVCachegpt-oss