Change8

v3.10.0

📦 mlflowView on GitHub →
41 features🐛 9 fixes🔧 5 symbols

Summary

MLflow 3.10.0 introduces major features like organization support for multi-workspace tracking, advanced multi-turn conversation evaluation, and automatic LLM trace cost tracking. The release also includes a significant navigation bar redesign and a new one-click demo experiment.

Migration Steps

  1. If you were relying on the old virtualenv path management, note that it has been replaced with `python -m venv` in the env_manager path.

✨ New Features

  • MLflow now supports organization and multi-workspace environments within a single tracking server for logical isolation of experiments, models, and prompts.
  • Introduced multi-turn evaluation capabilities, allowing evaluation of existing conversations with session-level scorers and simulation of new conversations for agent testing.
  • Added Trace Cost Tracking, automatically extracting model information from LLM spans to calculate and render costs in trace views.
  • Redesigned the navigation bar with a workflow type selector (GenAI/Classical ML) and streamlined sidebars for a cleaner user experience.
  • Added an MLflow Demo Experiment accessible with one click to explore tracing, evaluation, and prompt management without configuration.
  • Implemented Gateway Usage Tracking, providing analytics on AI Gateway endpoints with linked trace ingestion for end-to-end observability.
  • Enabled in-UI Trace Evaluation, allowing users to run custom or pre-built LLM judges directly from the traces and sessions UI.
  • Added sliding animation to the workflow switch component in the UI.
  • Display cached tokens in the trace UI.
  • Moved the 'Select traces' button next to the 'Run judge' button in the Evaluation UI.
  • Implemented distributed tracing for gateway endpoints.
  • Added a user selector in the gateway usage page.
  • Added support for comma-separated rules in `# clint: disable=` comments during build.
  • Replaced usage of `virtualenv` with `python -m venv` in the virtualenv env_manager path across build, docs, models, projects, and scoring.
  • Added per-decorator `sampling_ratio_override` parameter to `@mlflow.trace`.
  • Added `mlflow datasets list` CLI command.
  • Added streaming support for typescript-anthropic.
  • Added API to delete dataset records.
  • Added tooltip link in UI to navigate to traces tab with time range filter applied.
  • Added SDK for distillation from conversation to goal/persona.
  • Integrated Livekit Agents in MLflow Tracing.
  • Enabled running scorers/judges from the trace details drawer in the UI.
  • Linked gateway calls and experiments for better observability.
  • Added optimization backend APIs to auth control for prompts.
  • Added SDK to search sessions to get complete sessions.
  • Added Reasoning in Chat UI for Mistral + Chat UI.
  • Added TruLens third-party scorer integration for evaluation.
  • Added Guardrails AI scorer integration for evaluation.
  • Added support for getting dataset by name (`get_dataset(name=...)`) in OSS environments.
  • Added session comparison UI with goal/persona matching.
  • Added model and cost rendering for spans in the UI.
  • Made the conversation simulator public and easily subclassable.
  • Added progress tracking for prompt optimization jobs.
  • Added Get, Search, and Delete prompt optimization job APIs.
  • Tracked intermediate candidates and evaluation scores in the gepa optimizer.
  • Added CreatePromptOptimizationJob and CancelPromptOptimizationJob APIs.
  • Added support for shift+select for Traces.
  • Added a selector for workflow type in the top-level navbar.
  • Added Prompt Optimization backend job wrapping for prompt optimization.
  • Added `--experiment-name` option to `mlflow experiments get` command.
  • Added workspace landing page and multi-workspace support to the UI.

🐛 Bug Fixes

  • Fixed an infinite fetch loop in the trace detail view when `num_spans` metadata mismatched.
  • Fixed dark mode implementation in the experiment UI.
  • Fixed the 'Select traces' button not showing new traces in the Judge UI.
  • Fixed RecursionError in strands, semantic_kernel, and haystack autologgers when using a shared tracer provider.
  • Fixed IntegrityError in `log_batch` when duplicate metrics spanned multiple key batches.
  • Added support for native tool calls in CrewAI 1.9.0+ autolog tests.
  • Fixed retrieval_relevance assessments being logged to the wrong span when chunk index was missing.
  • Fixed missing session metadata on failed session-level scorer assessments.
  • Enhanced path validation in `check_tarfil`.

Affected Symbols