v3.10.0
📦 mlflowView on GitHub →
✨ 41 features🐛 9 fixes🔧 5 symbols
Summary
MLflow 3.10.0 introduces major features like organization support for multi-workspace tracking, advanced multi-turn conversation evaluation, and automatic LLM trace cost tracking. The release also includes a significant navigation bar redesign and a new one-click demo experiment.
Migration Steps
- If you were relying on the old virtualenv path management, note that it has been replaced with `python -m venv` in the env_manager path.
✨ New Features
- MLflow now supports organization and multi-workspace environments within a single tracking server for logical isolation of experiments, models, and prompts.
- Introduced multi-turn evaluation capabilities, allowing evaluation of existing conversations with session-level scorers and simulation of new conversations for agent testing.
- Added Trace Cost Tracking, automatically extracting model information from LLM spans to calculate and render costs in trace views.
- Redesigned the navigation bar with a workflow type selector (GenAI/Classical ML) and streamlined sidebars for a cleaner user experience.
- Added an MLflow Demo Experiment accessible with one click to explore tracing, evaluation, and prompt management without configuration.
- Implemented Gateway Usage Tracking, providing analytics on AI Gateway endpoints with linked trace ingestion for end-to-end observability.
- Enabled in-UI Trace Evaluation, allowing users to run custom or pre-built LLM judges directly from the traces and sessions UI.
- Added sliding animation to the workflow switch component in the UI.
- Display cached tokens in the trace UI.
- Moved the 'Select traces' button next to the 'Run judge' button in the Evaluation UI.
- Implemented distributed tracing for gateway endpoints.
- Added a user selector in the gateway usage page.
- Added support for comma-separated rules in `# clint: disable=` comments during build.
- Replaced usage of `virtualenv` with `python -m venv` in the virtualenv env_manager path across build, docs, models, projects, and scoring.
- Added per-decorator `sampling_ratio_override` parameter to `@mlflow.trace`.
- Added `mlflow datasets list` CLI command.
- Added streaming support for typescript-anthropic.
- Added API to delete dataset records.
- Added tooltip link in UI to navigate to traces tab with time range filter applied.
- Added SDK for distillation from conversation to goal/persona.
- Integrated Livekit Agents in MLflow Tracing.
- Enabled running scorers/judges from the trace details drawer in the UI.
- Linked gateway calls and experiments for better observability.
- Added optimization backend APIs to auth control for prompts.
- Added SDK to search sessions to get complete sessions.
- Added Reasoning in Chat UI for Mistral + Chat UI.
- Added TruLens third-party scorer integration for evaluation.
- Added Guardrails AI scorer integration for evaluation.
- Added support for getting dataset by name (`get_dataset(name=...)`) in OSS environments.
- Added session comparison UI with goal/persona matching.
- Added model and cost rendering for spans in the UI.
- Made the conversation simulator public and easily subclassable.
- Added progress tracking for prompt optimization jobs.
- Added Get, Search, and Delete prompt optimization job APIs.
- Tracked intermediate candidates and evaluation scores in the gepa optimizer.
- Added CreatePromptOptimizationJob and CancelPromptOptimizationJob APIs.
- Added support for shift+select for Traces.
- Added a selector for workflow type in the top-level navbar.
- Added Prompt Optimization backend job wrapping for prompt optimization.
- Added `--experiment-name` option to `mlflow experiments get` command.
- Added workspace landing page and multi-workspace support to the UI.
🐛 Bug Fixes
- Fixed an infinite fetch loop in the trace detail view when `num_spans` metadata mismatched.
- Fixed dark mode implementation in the experiment UI.
- Fixed the 'Select traces' button not showing new traces in the Judge UI.
- Fixed RecursionError in strands, semantic_kernel, and haystack autologgers when using a shared tracer provider.
- Fixed IntegrityError in `log_batch` when duplicate metrics spanned multiple key batches.
- Added support for native tool calls in CrewAI 1.9.0+ autolog tests.
- Fixed retrieval_relevance assessments being logged to the wrong span when chunk index was missing.
- Fixed missing session metadata on failed session-level scorer assessments.
- Enhanced path validation in `check_tarfil`.