Change8

v0.20.1-rc2

📦 ollamaView on GitHub →
2 features🐛 4 fixes🔧 2 symbols

Summary

This release introduces performance improvements via flash attention for gemma4 and fixes several parsing and build issues related to argument handling and ROCm compilation.

✨ New Features

  • Added prompt calibration, context size flag, and NumCtx reporting to benchmarking tools.
  • Enabled flash attention for gemma4 models.

🐛 Bug Fixes

  • Fixed argument parsing for gemma4 when quoted strings contain the character ".
  • Skipped cublasGemmBatchedEx during graph reservation in ggml.
  • Fixed ROCm build issue related to the cublasGemmBatchedEx reserve wrapper.
  • Reworked tool call handling for gemma4 models.

Affected Symbols