v4.6.0

📅 Mar 16, 2026📦 datadog-sdkView on GitHub →

✨ 15 features🐛 16 fixes⚡ 1 deprecations🔧 13 symbols

Summary

This release introduces significant enhancements to LLM Observability, including experiment status reporting, DeepEval integration, and prompt management via `LLMObs.get_prompt()`. It also includes numerous bug fixes across Profiling, AI Guard, and tracing components, alongside a deprecation warning for a future type change in `Span.parent_id`.

Migration Steps

When using DeepEval evaluations in LLM Observability Experiments, ensure the evaluation inherits from `BaseMetric` or `BaseConversationalMetric`.

✨ New Features

Experiments spans now contain config from the experiment initialization, allowing for searching of relevant spans using the experiment config.
Experiments spans now contain the tags from the dataset records, allowing for searching of relevant spans using the dataset record tags.
Introduces inferred proxy support for Azure API Management.
Enable stats computation by default for python 3.14 and above.
Adds SDS (Sensitive Data Scanner) findings to AI Guard spans, enabling visibility into sensitive data detected in LLM inputs and outputs.
LLM Experiments now report their execution status to the backend (`running`, `completed`, `failed`, or `interrupted`).
Adds `LLMObs.publish_evaluator()` to sync a locally-defined `LLMJudge` evaluator to the Datadog UI as a custom LLM-as-Judge evaluation.
Adds support for DeepEval evaluations in LLM Observability Experiments by allowing users to pass a DeepEval evaluation (which either inherents from `BaseMetric` or `BaseConversationalMetric`) in an LLM Obs Experiment.
Adds experiment summary logging after `run()` with row count, run count, per-evaluator stats, and error counts.
Adds `max_retries` and `retry_delay` parameters to `experiment.run()` for retrying failed tasks and evaluators.
Introduces `LLMObs.get_prompt()` to retrieve managed prompts from Datadog's Prompt Registry, returning a `ManagedPrompt` object with a `format()` method.
Experiments propagate canonical_ids from dataset records to the corresponding experiments span when present (after calling `pull_dataset`).
`LLMObs.create_dataset` supports a `bulk_upload` parameter to control data uploading behavior.
`LLMObs.create_dataset` and `LLMObs.create_dataset_from_csv` support users specifying the `deduplicate` parameter.
Subset of dataset records can now be pulled with tags by using the `tags` argument to `LLMObs.pull_dataset`.

🐛 Bug Fixes

Fix data duplication issue when uploading > 5MB datasets via `LLMObs.create_dataset`.
Fix TypeError while processing failed AI Guard responses, leading to overriding the original error.
Fixes an `AttributeError` on `openai-agents >= 0.8.0` caused by the removal of `AgentRunner._run_single_turn`.
A bug which could prevent Profiling from being enabled when the library is installed through Single Step Instrumentation was fixed.
Fixes an issue where the profiler was patching the `gevent` module unnecessarily even when the profiler was not enabled.
A bug that would cause certain function names to be displayed as `<module>` in flame graphs has been fixed.
Fix lock contention in the profiler's greenlet stack sampler that could cause connection pool exhaustion in gevent-based applications (e.g. gunicorn + gevent + psycopg2).
Fixes an issue where the lock profiler's wrapper class did not support PEP 604 type union syntax (e.g., `asyncio.Condition | None`), causing a `TypeError` at import time for libraries using union type annotations at class definition time.
Add `kafka_cluster_id` tag to Kafka offset/backlog tracking for confluent-kafka.
Fixes a memory corruption issue where concurrent calls to the WAF on the same request context from multiple threads could cause crashes inside `libddwaf`; a per-context lock now serializes WAF calls on the same context.
Avoid pickling wrappers in `ddtrace.internal.wrapping.context.BaseWrappingContext`.
Fixed an incompatibility with `pytest-html` and other third-party reporting plugins caused by the ddtrace pytest plugin using a non-standard `dd_retry` test outcome for retry attempts; the outcome is now set to `rerun`.
Fixes a `RuntimeError: generator didn't yield` in the Symbol DB remote config subscriber when the process has no writable temporary directory.
Propagate distributed tracing headers for tasks that are not registered locally so traces link correctly across workers.
Fix for a potential race condition affecting internal periodic worker threads that could have caused a `RuntimeError` during forks.
Add a timeout to Unix socket connections to prevent thread I/O blocking.

Affected Symbols

Span.parent_id LLMObs.publish_evaluator LLMObs.experiment experiment.run LLMObs.get_prompt ManagedPrompt.format LLMObs.annotation_context LLMObs.to_annotation_dict LLMObs.create_dataset LLMObs.create_dataset_from_csv LLMObs.pull_dataset ddtrace.internal.wrapping.context.BaseWrappingContext AgentRunner._run_single_turn

⚡ Deprecations

The type annotation for `Span.parent_id` will change from `Optional[int]` to `int` in v5.0.0.