Change8

v0.11.11

Breaking Changes
📦 ollama
1 breaking6 features🐛 5 fixes🔧 6 symbols

Summary

This release adds CUDA 13 support, introduces a dimensions field for embeddings, and improves memory estimation and app UI. It also removes support for loading split vision models in the Ollama engine.

⚠️ Breaking Changes

  • Split vision models are no longer supported in the Ollama engine.

Migration Steps

  1. If using split vision models, transition to supported model formats as they will no longer load in the Ollama engine.

✨ New Features

  • Added support for CUDA 13.
  • Added 'dimensions' field to embedding requests.
  • Added zoom and shrink functionality (Cmd +/-) in the Ollama app.
  • Added ability to copy assistant messages in the Ollama app.
  • Enabled new memory estimates in the Ollama engine by default.
  • Improved scrolling performance in the Ollama app for long prompts.

🐛 Bug Fixes

  • Fixed error when importing safetensor files.
  • Fixed error occurring when batch size exceeded context length.
  • Fixed validation issues for Flash Attention and KV cache quantization.
  • Improved memory usage for gpt-oss in the Ollama app.
  • Improved memory estimates for hybrid and recurrent models.

🔧 Affected Symbols

embeddimensionssafetensorFlashAttentionKVCachegpt-oss