v3.10.0

📅 Feb 20, 2026📦 mlflowView on GitHub →

✨ 41 features🐛 9 fixes🔧 5 symbols

Summary

MLflow 3.10.0 introduces major features like organization support for multi-workspace tracking, advanced multi-turn conversation evaluation, and automatic LLM trace cost tracking. The release also includes a significant navigation bar redesign and a new one-click demo experiment.

Migration Steps

If you were relying on the old virtualenv path management, note that it has been replaced with `python -m venv` in the env_manager path.

✨ New Features

MLflow now supports organization and multi-workspace environments within a single tracking server for logical isolation of experiments, models, and prompts.
Introduced multi-turn evaluation capabilities, allowing evaluation of existing conversations with session-level scorers and simulation of new conversations for agent testing.
Added Trace Cost Tracking, automatically extracting model information from LLM spans to calculate and render costs in trace views.
Redesigned the navigation bar with a workflow type selector (GenAI/Classical ML) and streamlined sidebars for a cleaner user experience.
Added an MLflow Demo Experiment accessible with one click to explore tracing, evaluation, and prompt management without configuration.
Implemented Gateway Usage Tracking, providing analytics on AI Gateway endpoints with linked trace ingestion for end-to-end observability.
Enabled in-UI Trace Evaluation, allowing users to run custom or pre-built LLM judges directly from the traces and sessions UI.
Added sliding animation to the workflow switch component in the UI.
Display cached tokens in the trace UI.
Moved the 'Select traces' button next to the 'Run judge' button in the Evaluation UI.
Implemented distributed tracing for gateway endpoints.
Added a user selector in the gateway usage page.
Added support for comma-separated rules in `# clint: disable=` comments during build.
Replaced usage of `virtualenv` with `python -m venv` in the virtualenv env_manager path across build, docs, models, projects, and scoring.
Added per-decorator `sampling_ratio_override` parameter to `@mlflow.trace`.
Added `mlflow datasets list` CLI command.
Added streaming support for typescript-anthropic.
Added API to delete dataset records.
Added tooltip link in UI to navigate to traces tab with time range filter applied.
Added SDK for distillation from conversation to goal/persona.
Integrated Livekit Agents in MLflow Tracing.
Enabled running scorers/judges from the trace details drawer in the UI.
Linked gateway calls and experiments for better observability.
Added optimization backend APIs to auth control for prompts.
Added SDK to search sessions to get complete sessions.
Added Reasoning in Chat UI for Mistral + Chat UI.
Added TruLens third-party scorer integration for evaluation.
Added Guardrails AI scorer integration for evaluation.
Added support for getting dataset by name (`get_dataset(name=...)`) in OSS environments.
Added session comparison UI with goal/persona matching.
Added model and cost rendering for spans in the UI.
Made the conversation simulator public and easily subclassable.
Added progress tracking for prompt optimization jobs.
Added Get, Search, and Delete prompt optimization job APIs.
Tracked intermediate candidates and evaluation scores in the gepa optimizer.
Added CreatePromptOptimizationJob and CancelPromptOptimizationJob APIs.
Added support for shift+select for Traces.
Added a selector for workflow type in the top-level navbar.
Added Prompt Optimization backend job wrapping for prompt optimization.
Added `--experiment-name` option to `mlflow experiments get` command.
Added workspace landing page and multi-workspace support to the UI.

🐛 Bug Fixes

Fixed an infinite fetch loop in the trace detail view when `num_spans` metadata mismatched.
Fixed dark mode implementation in the experiment UI.
Fixed the 'Select traces' button not showing new traces in the Judge UI.
Fixed RecursionError in strands, semantic_kernel, and haystack autologgers when using a shared tracer provider.
Fixed IntegrityError in `log_batch` when duplicate metrics spanned multiple key batches.
Added support for native tool calls in CrewAI 1.9.0+ autolog tests.
Fixed retrieval_relevance assessments being logged to the wrong span when chunk index was missing.
Fixed missing session metadata on failed session-level scorer assessments.
Enhanced path validation in `check_tarfil`.

Affected Symbols

mlflow.trace mlflow datasets list (CLI)mlflow experiments get (CLI)virtualenv (env_manager path)typescript-anthropic (streaming support)