Change8

v3.5.0

📦 mlflowView on GitHub →
19 features🐛 28 fixes1 deprecations🔧 18 symbols

Summary

MLflow 3.5.0 adds major tracing, prompt optimization, UI onboarding, and security middleware features, along with numerous enhancements and bug fixes across tracing, tracking, evaluation, and model registry.

Migration Steps

  1. Configure the new security middleware in the tracking server settings as described in the documentation.

✨ New Features

  • Tracing support for Claude Code SDK with autologging of prompts, responses, tool calls, and more.
  • Flexible Prompt Optimization API with model switching and the GEPA algorithm for efficient prompt tuning.
  • Enhanced UI onboarding with a trace quickstart drawer and updated homepage guidance.
  • Security middleware for the tracking server to protect against DNS rebinding, CORS attacks, and other threats.
  • Added batch operation unlink_traces_from_run for tracing/tracking.
  • Added batch trace link/unlink operations to DatabricksTracingRestStore.
  • Claude Code SDK autologging support.
  • Support for reading trace configuration from environment variables.
  • Mistral tracing improvements.
  • Gemini token count tracking and streaming support.
  • CrewAI token count tracking with documentation updates.
  • Evaluation enhancements: allow empty scorer list, log assessments to DSPy evaluation traces, support trace inputs to built-in scorers, synonym handling, span timing tool for Agent Judges, ability to disable evaluation sample check, reduced SIMBA optimizer log verbosity, and __repr__ method for Judges.
  • Prompt registry support added to MLflow webhooks and Prompt Registry Chat UI.
  • UI improvements: delete parent and child runs together; move charts to top or bottom.
  • Tracking improvements: use sampling data for run comparison, optional 'outputs' column for evaluation dataset records, and job backend execution.
  • Model Registry fixes for Unity Catalog integration and early exit of webhook delivery for FileStore instances.
  • PolarsDataset.to_evaluation_dataset now receives dataset name and digest.
  • mlflow server now handles missing optional huey package gracefully.
  • Scoring fix for chat completion arguments.

🐛 Bug Fixes

  • Fixed parent run resolution mechanism for LangChain.
  • Added client-side retry for get_trace to improve reliability.
  • Fixed OpenTelemetry dual export and resource attribute propagation.
  • Suppressed false warnings from span logging.
  • Fixed DSPy prompt display.
  • Fixed usage aggregation to avoid ancestor duplication.
  • Fixed double counting in Strands tracing.
  • Fixed to_predict_fn to handle traces without tags field.
  • URL-encoded trace tag keys in delete_trace_tag to prevent 404 errors.
  • Fixed Claude Code autologging inputs not displaying.
  • Fixed runs with 0-valued metrics not appearing in experiment list contour plots.
  • Fixed DSPy run display.
  • Allowed list of types in tools JSON Schema for OpenAI autolog.
  • Set tracking URI environment variable for job runner.
  • Added atomicity to job_start API.
  • Fixed trace ingest for outputs in merge_records API.
  • Fixed judge regression and ensured judges use non-empty user messages for Anthropic model compatibility.
  • Fixed endpoints error in judge.
  • Fixed creating model versions from non-Databricks tracking to Databricks Unity Catalog registry.
  • Fixed registry URI instantiation for artifact download.
  • Included original error details in Unity Catalog model copy failure messages.
  • Fixed webhook delivery to exit early for FileStore instances.
  • Fixed error suppression during prompt alias resolution when allow_missing is set.
  • General UI improvements.
  • Fixed dataset issue in Models.
  • Forwarded dataset name and digest to PolarsDataset's to_evaluation_dataset method.
  • Fixed mlflow server exiting immediately when optional huey package is missing.
  • Fixed chat completion arguments in Scoring.

Affected Symbols

⚡ Deprecations

  • Custom prompt judge is deprecated; see deprecation notice in the documentation.