v3.4.0

📅 Sep 17, 2025📦 mlflowView on GitHub →

✨ 28 features🐛 17 fixes🔧 22 symbols

Summary

MLflow 3.4.0rc0 adds extensive new capabilities—including OpenTelemetry metrics export, MCP server integration, a custom judges API, and experiment types UI—while delivering numerous feature enhancements and bug fixes across evaluation, tracing, CLI, tracking, and model registry.

✨ New Features

OpenTelemetry Metrics Export: MLflow now exports span-level statistics as OpenTelemetry metrics, providing enhanced observability and monitoring capabilities for traced applications.
MCP Server Integration: Introducing the Model Context Protocol (MCP) server for MLflow, enabling AI assistants and LLMs to interact with MLflow programmatically.
Custom Judges API: New `make_judge` API enables creation of custom evaluation judges for assessing LLM outputs with domain-specific criteria.
Correlations Backend: Implemented backend infrastructure for storing and computing correlations between experiment metrics using NPMI (Normalized Pointwise Mutual Information).
Evaluation Datasets: MLflow now supports storing and versioning evaluation datasets directly within experiments for reproducible model assessment.
Databricks Backend for MLflow Server: MLflow server can now use Databricks as a backend, enabling seamless integration with Databricks workspaces.
Claude Autologging: Automatic tracing support for Claude AI interactions, capturing conversations and model responses.
Strands Agent Tracing: Added comprehensive tracing support for Strands agents, including automatic instrumentation for agent workflows and interactions.
Experiment Types in UI: MLflow now introduces experiment types, helping reduce clutter between classic ML/DL and GenAI features; auto-detects the type and allows adjustment via a selector next to the experiment name.
Add ability to pass tags via dataframe in mlflow.genai.evaluate.
Add custom judge model support for Safety and RetrievalRelevance builtin scorers.
Add AI commands as MCP prompts for LLM interaction.
Add MLFLOW_ENABLE_OTLP_EXPORTER environment variable.
Support OTel and MLflow dual export.
Make set_destination use ContextVar for thread safety.
Add MLflow commands CLI for exposing prompt commands to LLMs.
Add 'mlflow runs link-traces' command.
Add 'mlflow runs create' command for programmatic run creation.
Add MLflow traces CLI command with comprehensive search and management capabilities.
Add --env-file flag to all MLflow CLI commands.
Backend for storing scorers in MLflow experiments.
Allow cross-workspace copying of model versions between WMR and UC.
Add automatic Git-based model versioning for GenAI applications.
Improve WheeledModel._download_wheels safety.
Support resume run for Optuna hyperparameter optimization.
Add MLFLOW_DEPLOYMENT_CLIENT_HTTP_REQUEST_TIMEOUT environment variable.
Add ability to hide/unhide all finished runs in Chart view.
Add MLflow OSS telemetry for invoke_custom_judge_model.

🐛 Bug Fixes

Implement DSPy LM interface for default Databricks model serving.
Fix aggregations incorrectly applied to legacy scorer interface.
Add Unity Catalog table source support for mlflow.evaluate.
Fix custom prompt judge encoding issues with custom judge models.
Fix OpenAI autolog to properly reconstruct Response objects from streaming events.
Add basic authentication support in TypeScript SDK.
Update scorer endpoints to v3.0 API specification.
Fix scorer status handling in MLflow tracking backend.
Fix missing source-run information in UI.
Fix spark_udf to always use stdin_serve for model serving.
Fix a bug with Spark UDF usage of uv as an environment manager.
Extract source workspace ID from run_link during model version migration.
Improve security by reducing write permissions in temporary directory creation.
Fix --env-file flag compatibility with --dev mode.
Fix basic authentication with Uvicorn server.
Fix experiment comparison functionality in UI.
Fix compareExperimentsSearch route definitions.

Summary

✨ New Features

🐛 Bug Fixes

Affected Symbols