v3.6.0
Breaking Changes📦 mlflowView on GitHub →
⚠ 2 breaking✨ 40 features🐛 25 fixes⚡ 2 deprecations🔧 10 symbols
Summary
MLflow 3.6.0 introduces full OpenTelemetry support, a new Agent Server, extensive tracing and evaluation enhancements, and numerous UI and framework integrations, while deprecating several flavors and changing span naming conventions.
⚠️ Breaking Changes
- The pmdarima, promptflow, and diviner flavors have been deprecated; replace any usage of these flavors with supported alternatives or remove them from your pipelines.
- Span names no longer include automatic numbering suffixes such as "_1", "_2", etc.; update any code that relied on these suffixes to use the new naming convention.
Migration Steps
- Replace any usage of the deprecated pmdarima, promptflow, or diviner flavors with supported alternatives or remove them from your pipelines.
- Update code that relied on automatically suffixed span names (e.g., "my_span_1") to use the new naming convention without the numeric suffix.
- Review and migrate away from filesystem backends that now emit deprecation warnings; switch to cloud or database‑backed storage as appropriate.
- If you previously registered custom scorers, remove those registrations and use the built‑in scorer registration CLI commands instead.
- Ensure your tracing instrumentation is updated to the new OpenTelemetry integration APIs.
- Verify that any scripts or CI pipelines that invoke `mlflow.spark.load_model` handle Unity Catalog Volume paths correctly.
- Adjust any calls to `log_metric` that passed non‑Dataset objects to now pass `mlflow.entities.Dataset` where required.
✨ New Features
- Full OpenTelemetry support in the OSS MLflow server for ingesting traces and seamless SDK integration.
- Session-level view added to the Trace UI with a dedicated chat sessions tab.
- Experiment navigation bar moved to the left side of the UI for better scalability.
- TypeScript Tracing SDK now auto‑traces Vercel AI SDK, Gemini, Anthropic, and Mastra frameworks.
- Automatic tracking and rendering of LLM judge evaluation costs and traces.
- New Agent Server infrastructure for managing and deploying scoring agents.
- Support for structured outputs in the make_judge evaluation API.
- Agent‑as‑a‑judge support for the default Databricks endpoint.
- Frontend adjustments to handle and display judge traces.
- Record judge traces and render associated cost information.
- Added `search_traces` tool for agentic judge workflows.
- Profile usage support in Databricks Agents dataset API operations.
- Added `description` property to the Scorer interface.
- CLI command `mlflow scorers register-llm-judge` for registering LLM judges.
- CLI command to list registered scorers by experiment.
- Allow passing an empty scorer list for manual result comparison.
- CLI command `mlflow traces eval` for evaluating traces.
- Documentation added for new OpenTelemetry tracing integrations.
- Trace UI now displays trace metadata.
- Automatic session ID tracking for LangGraph traces.
- RLIKE operator support added for trace search queries.
- Attributes translation support for OpenTelemetry clients.
- Auto‑tracing implementation for Vercel AI SDK.
- Minor cleanup of the trace summary view.
- Search by span details enabled in the OSS MLflow server.
- UI filtering by span content, type, and name.
- Child‑Parent link visualisation in the UI.
- PyTorch Lightning autologging now logs model signatures.
- Option to use the same database for tracking and authentication.
- Job backend can create a virtual Python environment for job execution.
- Option to skip pip installation when packing environments for model serving.
- Support for LangChain 1.x.
- Default UBJSON format for XGBoost model serialization.
- Configuration option for long‑running deployment client requests.
- OpenAI provider now supports streamed function‑calling responses.
- Gemini provider now supports function calling.
- Anthropic provider now supports function calling.
- AI‑gateway revamp adds traffic routing to multiple endpoints.
- fastmcp moved to an optional `mcp` extra.
- Sticky header added to code blocks in MLflow documentation examples.
🐛 Bug Fixes
- Model Registry: Skip `_raise_if_prompt` for Unity Catalog tag operations.
- Model Registry / Models / Scoring: `mlflow.spark.load_model` now correctly handles Unity Catalog Volumes paths.
- Models: Fixed streaming issues.
- Tracing: Fixed async generator handling in the LlamaIndex tracer.
- Tracing: Paginated `delete_traces` calls to Databricks MLflow server.
- Tracing: Reused traces in `genai.evaluate` when endpoint uses dual‑write mode.
- Tracking: `log_metric` now accepts `mlflow.entities.Dataset` objects.
- Tracking: Enhanced `SqlAlchemyStore` to include model outputs in run search results.
- Tracking: Added validation checks for search runs.
- Tracking: Updated run name correctly when resuming an existing run.
- Tracking: Disabled autologging for PyTorch forecasting model predict method.
- Evaluation: Fixed job store SQL engine race condition.
- Evaluation: Eagerly launch Huey consumer to prevent race condition.
- Evaluation: Fixed plugin incompatibility caused by circular import.
- Evaluation: Removed ability to register or load custom scorers.
- Evaluation: Added specificity to system prompt for metrics.
- Evaluation: Added support for evaluating traces and linking to runs in OSS.
- Evaluation: Adjusted utilities for remote tracking server declaration.
- Evaluation: Added atomicity to `job_start` API.
- UI: Fixed search filter for metrics/params with spaces in names.
- UI: Fixed assessment editing UI resetting field values when selecting a name.
- UI: Removed X‑Frame‑Options header for notebook trace renderer.
- Evaluation / UI: Fixed evaluation runs table link to point to traces tab.
- Prompts: Fixed typo in gepa version.
- Artifacts: Fixed handling of `pathlib.Path` in `validation.py`.
Affected Symbols
⚡ Deprecations
- pmdarima, promptflow, and diviner flavors are deprecated.
- Filesystem backend usage now emits deprecation warnings; migrate to supported storage backends.