v3.6.0

Breaking Changes

📅 Nov 8, 2025📦 mlflowView on GitHub →

⚠ 2 breaking✨ 40 features🐛 25 fixes⚡ 2 deprecations🔧 10 symbols

Summary

MLflow 3.6.0 introduces full OpenTelemetry support, a new Agent Server, extensive tracing and evaluation enhancements, and numerous UI and framework integrations, while deprecating several flavors and changing span naming conventions.

⚠️ Breaking Changes

The pmdarima, promptflow, and diviner flavors have been deprecated; replace any usage of these flavors with supported alternatives or remove them from your pipelines.
Span names no longer include automatic numbering suffixes such as "_1", "_2", etc.; update any code that relied on these suffixes to use the new naming convention.

Migration Steps

Replace any usage of the deprecated pmdarima, promptflow, or diviner flavors with supported alternatives or remove them from your pipelines.
Update code that relied on automatically suffixed span names (e.g., "my_span_1") to use the new naming convention without the numeric suffix.
Review and migrate away from filesystem backends that now emit deprecation warnings; switch to cloud or database‑backed storage as appropriate.
If you previously registered custom scorers, remove those registrations and use the built‑in scorer registration CLI commands instead.
Ensure your tracing instrumentation is updated to the new OpenTelemetry integration APIs.
Verify that any scripts or CI pipelines that invoke `mlflow.spark.load_model` handle Unity Catalog Volume paths correctly.
Adjust any calls to `log_metric` that passed non‑Dataset objects to now pass `mlflow.entities.Dataset` where required.

✨ New Features

Full OpenTelemetry support in the OSS MLflow server for ingesting traces and seamless SDK integration.
Session-level view added to the Trace UI with a dedicated chat sessions tab.
Experiment navigation bar moved to the left side of the UI for better scalability.
TypeScript Tracing SDK now auto‑traces Vercel AI SDK, Gemini, Anthropic, and Mastra frameworks.
Automatic tracking and rendering of LLM judge evaluation costs and traces.
New Agent Server infrastructure for managing and deploying scoring agents.
Support for structured outputs in the make_judge evaluation API.
Agent‑as‑a‑judge support for the default Databricks endpoint.
Frontend adjustments to handle and display judge traces.
Record judge traces and render associated cost information.
Added `search_traces` tool for agentic judge workflows.
Profile usage support in Databricks Agents dataset API operations.
Added `description` property to the Scorer interface.
CLI command `mlflow scorers register-llm-judge` for registering LLM judges.
CLI command to list registered scorers by experiment.
Allow passing an empty scorer list for manual result comparison.
CLI command `mlflow traces eval` for evaluating traces.
Documentation added for new OpenTelemetry tracing integrations.
Trace UI now displays trace metadata.
Automatic session ID tracking for LangGraph traces.
RLIKE operator support added for trace search queries.
Attributes translation support for OpenTelemetry clients.
Auto‑tracing implementation for Vercel AI SDK.
Minor cleanup of the trace summary view.
Search by span details enabled in the OSS MLflow server.
UI filtering by span content, type, and name.
Child‑Parent link visualisation in the UI.
PyTorch Lightning autologging now logs model signatures.
Option to use the same database for tracking and authentication.
Job backend can create a virtual Python environment for job execution.
Option to skip pip installation when packing environments for model serving.
Support for LangChain 1.x.
Default UBJSON format for XGBoost model serialization.
Configuration option for long‑running deployment client requests.
OpenAI provider now supports streamed function‑calling responses.
Gemini provider now supports function calling.
Anthropic provider now supports function calling.
AI‑gateway revamp adds traffic routing to multiple endpoints.
fastmcp moved to an optional `mcp` extra.
Sticky header added to code blocks in MLflow documentation examples.

🐛 Bug Fixes

Model Registry: Skip `_raise_if_prompt` for Unity Catalog tag operations.
Model Registry / Models / Scoring: `mlflow.spark.load_model` now correctly handles Unity Catalog Volumes paths.
Models: Fixed streaming issues.
Tracing: Fixed async generator handling in the LlamaIndex tracer.
Tracing: Paginated `delete_traces` calls to Databricks MLflow server.
Tracing: Reused traces in `genai.evaluate` when endpoint uses dual‑write mode.
Tracking: `log_metric` now accepts `mlflow.entities.Dataset` objects.
Tracking: Enhanced `SqlAlchemyStore` to include model outputs in run search results.
Tracking: Added validation checks for search runs.
Tracking: Updated run name correctly when resuming an existing run.
Tracking: Disabled autologging for PyTorch forecasting model predict method.
Evaluation: Fixed job store SQL engine race condition.
Evaluation: Eagerly launch Huey consumer to prevent race condition.
Evaluation: Fixed plugin incompatibility caused by circular import.
Evaluation: Removed ability to register or load custom scorers.
Evaluation: Added specificity to system prompt for metrics.
Evaluation: Added support for evaluating traces and linking to runs in OSS.
Evaluation: Adjusted utilities for remote tracking server declaration.
Evaluation: Added atomicity to `job_start` API.
UI: Fixed search filter for metrics/params with spaces in names.
UI: Fixed assessment editing UI resetting field values when selecting a name.
UI: Removed X‑Frame‑Options header for notebook trace renderer.
Evaluation / UI: Fixed evaluation runs table link to point to traces tab.
Prompts: Fixed typo in gepa version.
Artifacts: Fixed handling of `pathlib.Path` in `validation.py`.

Affected Symbols

mlflow.spark.load_model log_metric mlflow.entities.Dataset delete_traces search_traces mlflow.traces.eval mlflow.scorers.register_llm_judge mlflow.scorers.list mlflow.traces mlflow.tracking

⚡ Deprecations

pmdarima, promptflow, and diviner flavors are deprecated.
Filesystem backend usage now emits deprecation warnings; migrate to supported storage backends.