v0.20.1-rc2
📦 ollamaView on GitHub →
✨ 2 features🐛 4 fixes🔧 2 symbols
Summary
This release introduces performance improvements via flash attention for gemma4 and fixes several parsing and build issues related to argument handling and ROCm compilation.
✨ New Features
- Added prompt calibration, context size flag, and NumCtx reporting to benchmarking tools.
- Enabled flash attention for gemma4 models.
🐛 Bug Fixes
- Fixed argument parsing for gemma4 when quoted strings contain the character ".
- Skipped cublasGemmBatchedEx during graph reservation in ggml.
- Fixed ROCm build issue related to the cublasGemmBatchedEx reserve wrapper.
- Reworked tool call handling for gemma4 models.