Change8

v3.4.0

📦 mlflowView on GitHub →
28 features🐛 17 fixes🔧 22 symbols

Summary

MLflow 3.4.0rc0 adds extensive new capabilities—including OpenTelemetry metrics export, MCP server integration, a custom judges API, and experiment types UI—while delivering numerous feature enhancements and bug fixes across evaluation, tracing, CLI, tracking, and model registry.

✨ New Features

  • OpenTelemetry Metrics Export: MLflow now exports span-level statistics as OpenTelemetry metrics, providing enhanced observability and monitoring capabilities for traced applications.
  • MCP Server Integration: Introducing the Model Context Protocol (MCP) server for MLflow, enabling AI assistants and LLMs to interact with MLflow programmatically.
  • Custom Judges API: New `make_judge` API enables creation of custom evaluation judges for assessing LLM outputs with domain-specific criteria.
  • Correlations Backend: Implemented backend infrastructure for storing and computing correlations between experiment metrics using NPMI (Normalized Pointwise Mutual Information).
  • Evaluation Datasets: MLflow now supports storing and versioning evaluation datasets directly within experiments for reproducible model assessment.
  • Databricks Backend for MLflow Server: MLflow server can now use Databricks as a backend, enabling seamless integration with Databricks workspaces.
  • Claude Autologging: Automatic tracing support for Claude AI interactions, capturing conversations and model responses.
  • Strands Agent Tracing: Added comprehensive tracing support for Strands agents, including automatic instrumentation for agent workflows and interactions.
  • Experiment Types in UI: MLflow now introduces experiment types, helping reduce clutter between classic ML/DL and GenAI features; auto-detects the type and allows adjustment via a selector next to the experiment name.
  • Add ability to pass tags via dataframe in mlflow.genai.evaluate.
  • Add custom judge model support for Safety and RetrievalRelevance builtin scorers.
  • Add AI commands as MCP prompts for LLM interaction.
  • Add MLFLOW_ENABLE_OTLP_EXPORTER environment variable.
  • Support OTel and MLflow dual export.
  • Make set_destination use ContextVar for thread safety.
  • Add MLflow commands CLI for exposing prompt commands to LLMs.
  • Add 'mlflow runs link-traces' command.
  • Add 'mlflow runs create' command for programmatic run creation.
  • Add MLflow traces CLI command with comprehensive search and management capabilities.
  • Add --env-file flag to all MLflow CLI commands.
  • Backend for storing scorers in MLflow experiments.
  • Allow cross-workspace copying of model versions between WMR and UC.
  • Add automatic Git-based model versioning for GenAI applications.
  • Improve WheeledModel._download_wheels safety.
  • Support resume run for Optuna hyperparameter optimization.
  • Add MLFLOW_DEPLOYMENT_CLIENT_HTTP_REQUEST_TIMEOUT environment variable.
  • Add ability to hide/unhide all finished runs in Chart view.
  • Add MLflow OSS telemetry for invoke_custom_judge_model.

🐛 Bug Fixes

  • Implement DSPy LM interface for default Databricks model serving.
  • Fix aggregations incorrectly applied to legacy scorer interface.
  • Add Unity Catalog table source support for mlflow.evaluate.
  • Fix custom prompt judge encoding issues with custom judge models.
  • Fix OpenAI autolog to properly reconstruct Response objects from streaming events.
  • Add basic authentication support in TypeScript SDK.
  • Update scorer endpoints to v3.0 API specification.
  • Fix scorer status handling in MLflow tracking backend.
  • Fix missing source-run information in UI.
  • Fix spark_udf to always use stdin_serve for model serving.
  • Fix a bug with Spark UDF usage of uv as an environment manager.
  • Extract source workspace ID from run_link during model version migration.
  • Improve security by reducing write permissions in temporary directory creation.
  • Fix --env-file flag compatibility with --dev mode.
  • Fix basic authentication with Uvicorn server.
  • Fix experiment comparison functionality in UI.
  • Fix compareExperimentsSearch route definitions.

Affected Symbols