v3.7.0

Breaking Changes

📅 Dec 5, 2025📦 mlflowView on GitHub →

⚠ 3 breaking✨ 15 features🐛 36 fixes🔧 11 symbols

Summary

MLflow 3.7.0 adds major GenAI observability features such as an Experiment Prompts UI, multi-turn evaluation, trace comparison, and new auto-tracing SDKs, while introducing breaking changes like SQLite becoming the default tracking backend and removal of deprecated model flavors. It also includes numerous bug fixes and enhancements across tracking, evaluation, tracing, and UI components.

⚠️ Breaking Changes

Tracking: SQLite is now the default backend for the MLflow Tracking server. Projects using a different default backend need to update their configuration or migrate existing data to SQLite.
Models: The deprecated `diviner` flavor has been removed. Code referencing `mlflow.models.diviner` must be updated to use a supported flavor.
Models: The deprecated `promptflow` flavor has been removed. Remove any usage of `mlflow.models.promptflow` or replace with an alternative.

Migration Steps

If you rely on a non‑SQLite tracking backend, update your `MLFLOW_TRACKING_URI` or migrate existing tracking data to SQLite, as SQLite is now the default backend.
Remove any usage of the deprecated `diviner` and `promptflow` model flavors and replace them with supported alternatives.
Ensure your environment does not use Click version 8.3.0 (upgrade to a different version) due to the new pinning constraint.
Verify that parent directories for SQLite database files exist or rely on the new automatic creation feature.
Review and configure authentication for scorers now that auth support has been added.

✨ New Features

Experiment Prompts UI: Manage and search prompts directly within the experiment UI, with filter strings and prompt version search in traces.
Multi-turn Evaluation Support: `mlflow.genai.evaluate` now supports multi-turn conversations with DataFrame and list inputs.
Trace Comparison: Side-by-side comparison view in the Traces UI for analyzing LLM behavior across runs.
Gemini TypeScript SDK: Auto-tracing support for Google's Gemini in TypeScript.
Structured Outputs in Judges: `make_judge` API now supports structured outputs.
VoltAgent Tracing: Auto-tracing support for VoltAgent framework.
Tracking: Create parent directories for SQLite database files.
Prompts: Link Prompts and Experiments when prompts are loaded/registered.
Tracking: Include environment variable fallback for SGC run resumption.
Tracking: Add support for SGC run resumption from Databricks Jobs.
Evaluation: Add `--builtin/-b` flag to `mlflow scorers list` command.
Tracing: Pydantic AI Chat UI support.
Tracking: Add auth support for scorers.
Evaluation: Remove experimental flags from scorers.
Evaluation: Add description field to all built-in scorers.

🐛 Bug Fixes

Tracing: Handle traces with third-party generic root span.
Tracing: Fix OTLP endpoint path handling per OpenTelemetry spec.
Tracing: Add gzip/deflate Content-Encoding support to OTLP traces endpoint.
Tracing: Add missing `_delete_trace_tag_v3` API.
Tracing: Fix bug in chat sessions view where new sessions created after UI launch are not visible due to incorrect timestamp filtering.
Tracing: Fix OTLP proto conversion for empty list/dict.
Tracing: Agno V2 fixes.
Tracing: Fix `/v1/traces` endpoint to return protobuf instead of
Tracing: Pin `click!=8.3.0` in MCP extra to fix MCP server failure.
Tracing: Fix MCP server `uv` installation command for external users.
Evaluation: Fix trace-based scorer evaluation by using agentic judge adapter.
Evaluation: Fix managed scorer registration failure.
Evaluation: Fix `InstructionsJudge` using scorer description as assessment value.
Evaluation: Add validation to correctness judge expectation fields.
Evaluation: Fix model URI underscore handling.
Evaluation: Fix `evaluate_traces` MCP tool error: use `result_df` instead of `tables`.
Evaluation: Fix Bedrock Anthropic adapter by adding required `anthropic_version` field.
Evaluation: Fix migration for pre-existing auth tables.
Tracking: Fix tracking URI propagation.
Tracking: Fix `SqlLoggedModelMetric` association with `experiment_id`.
Tracking: Add Flask routes to auth validators.
Tracking: Add missing proto handler for Experiment association handling for datasets.
UI: Show full dataset record content and add search bar in evaluation datasets UI.
UI: Request TraceInfo and Trace Assessments from a relative API path.
UI: Define `LoggedModelOutput.to_dictionary()` so `LoggedModelOutput` and runs containing them can be JSON serialized.
UI: Fix router issue in TracesUI page.
Build: Fix `mlflow gc` to remove model artifacts.
Build: Fix Click 8.3.0 `Sentinel.UNSET` handling in MCP server.
Build: Add bucket-ownership checks for Amazon S3.
Docs: Fix Python indentation in custom trace quickstart example.
Docs: Fix property blocks rendering horizontally in API documentation.
Docs: Fix CLI link missing api_reference prefix in documentation sidebars.
Docs: Fix notebook download URLs to use versioned paths.
Docs: Fix documentation redirects for removed getting-started pages.
Models: Fix shared cluster Py4j statefulness issue.
Models: Prevent symlink path traversal in local artifact store.

Affected Symbols

mlflow.genai.evaluate make_judge mlflow.scorers.list SqlLoggedModelMetric LoggedModelOutput.to_dictionary mlflow.models.diviner mlflow.models.promptflow mlflow.gc _delete_trace_tag_v3 mlflow.tracking mlflow.tracing