Change8

v4.6.0

📦 datadog-sdkView on GitHub →
15 features🐛 16 fixes1 deprecations🔧 13 symbols

Summary

This release introduces significant enhancements to LLM Observability, including experiment status reporting, DeepEval integration, and prompt management via `LLMObs.get_prompt()`. It also includes numerous bug fixes across Profiling, AI Guard, and tracing components, alongside a deprecation warning for a future type change in `Span.parent_id`.

Migration Steps

  1. When using DeepEval evaluations in LLM Observability Experiments, ensure the evaluation inherits from `BaseMetric` or `BaseConversationalMetric`.

✨ New Features

  • Experiments spans now contain config from the experiment initialization, allowing for searching of relevant spans using the experiment config.
  • Experiments spans now contain the tags from the dataset records, allowing for searching of relevant spans using the dataset record tags.
  • Introduces inferred proxy support for Azure API Management.
  • Enable stats computation by default for python 3.14 and above.
  • Adds SDS (Sensitive Data Scanner) findings to AI Guard spans, enabling visibility into sensitive data detected in LLM inputs and outputs.
  • LLM Experiments now report their execution status to the backend (`running`, `completed`, `failed`, or `interrupted`).
  • Adds `LLMObs.publish_evaluator()` to sync a locally-defined `LLMJudge` evaluator to the Datadog UI as a custom LLM-as-Judge evaluation.
  • Adds support for DeepEval evaluations in LLM Observability Experiments by allowing users to pass a DeepEval evaluation (which either inherents from `BaseMetric` or `BaseConversationalMetric`) in an LLM Obs Experiment.
  • Adds experiment summary logging after `run()` with row count, run count, per-evaluator stats, and error counts.
  • Adds `max_retries` and `retry_delay` parameters to `experiment.run()` for retrying failed tasks and evaluators.
  • Introduces `LLMObs.get_prompt()` to retrieve managed prompts from Datadog's Prompt Registry, returning a `ManagedPrompt` object with a `format()` method.
  • Experiments propagate canonical_ids from dataset records to the corresponding experiments span when present (after calling `pull_dataset`).
  • `LLMObs.create_dataset` supports a `bulk_upload` parameter to control data uploading behavior.
  • `LLMObs.create_dataset` and `LLMObs.create_dataset_from_csv` support users specifying the `deduplicate` parameter.
  • Subset of dataset records can now be pulled with tags by using the `tags` argument to `LLMObs.pull_dataset`.

🐛 Bug Fixes

  • Fix data duplication issue when uploading > 5MB datasets via `LLMObs.create_dataset`.
  • Fix TypeError while processing failed AI Guard responses, leading to overriding the original error.
  • Fixes an `AttributeError` on `openai-agents >= 0.8.0` caused by the removal of `AgentRunner._run_single_turn`.
  • A bug which could prevent Profiling from being enabled when the library is installed through Single Step Instrumentation was fixed.
  • Fixes an issue where the profiler was patching the `gevent` module unnecessarily even when the profiler was not enabled.
  • A bug that would cause certain function names to be displayed as `<module>` in flame graphs has been fixed.
  • Fix lock contention in the profiler's greenlet stack sampler that could cause connection pool exhaustion in gevent-based applications (e.g. gunicorn + gevent + psycopg2).
  • Fixes an issue where the lock profiler's wrapper class did not support PEP 604 type union syntax (e.g., `asyncio.Condition | None`), causing a `TypeError` at import time for libraries using union type annotations at class definition time.
  • Add `kafka_cluster_id` tag to Kafka offset/backlog tracking for confluent-kafka.
  • Fixes a memory corruption issue where concurrent calls to the WAF on the same request context from multiple threads could cause crashes inside `libddwaf`; a per-context lock now serializes WAF calls on the same context.
  • Avoid pickling wrappers in `ddtrace.internal.wrapping.context.BaseWrappingContext`.
  • Fixed an incompatibility with `pytest-html` and other third-party reporting plugins caused by the ddtrace pytest plugin using a non-standard `dd_retry` test outcome for retry attempts; the outcome is now set to `rerun`.
  • Fixes a `RuntimeError: generator didn't yield` in the Symbol DB remote config subscriber when the process has no writable temporary directory.
  • Propagate distributed tracing headers for tasks that are not registered locally so traces link correctly across workers.
  • Fix for a potential race condition affecting internal periodic worker threads that could have caused a `RuntimeError` during forks.
  • Add a timeout to Unix socket connections to prevent thread I/O blocking.

Affected Symbols

⚡ Deprecations

  • The type annotation for `Span.parent_id` will change from `Optional[int]` to `int` in v5.0.0.