v4.6.0
📦 datadog-sdkView on GitHub →
✨ 15 features🐛 16 fixes⚡ 1 deprecations🔧 13 symbols
Summary
This release introduces significant enhancements to LLM Observability, including experiment status reporting, DeepEval integration, and prompt management via `LLMObs.get_prompt()`. It also includes numerous bug fixes across Profiling, AI Guard, and tracing components, alongside a deprecation warning for a future type change in `Span.parent_id`.
Migration Steps
- When using DeepEval evaluations in LLM Observability Experiments, ensure the evaluation inherits from `BaseMetric` or `BaseConversationalMetric`.
✨ New Features
- Experiments spans now contain config from the experiment initialization, allowing for searching of relevant spans using the experiment config.
- Experiments spans now contain the tags from the dataset records, allowing for searching of relevant spans using the dataset record tags.
- Introduces inferred proxy support for Azure API Management.
- Enable stats computation by default for python 3.14 and above.
- Adds SDS (Sensitive Data Scanner) findings to AI Guard spans, enabling visibility into sensitive data detected in LLM inputs and outputs.
- LLM Experiments now report their execution status to the backend (`running`, `completed`, `failed`, or `interrupted`).
- Adds `LLMObs.publish_evaluator()` to sync a locally-defined `LLMJudge` evaluator to the Datadog UI as a custom LLM-as-Judge evaluation.
- Adds support for DeepEval evaluations in LLM Observability Experiments by allowing users to pass a DeepEval evaluation (which either inherents from `BaseMetric` or `BaseConversationalMetric`) in an LLM Obs Experiment.
- Adds experiment summary logging after `run()` with row count, run count, per-evaluator stats, and error counts.
- Adds `max_retries` and `retry_delay` parameters to `experiment.run()` for retrying failed tasks and evaluators.
- Introduces `LLMObs.get_prompt()` to retrieve managed prompts from Datadog's Prompt Registry, returning a `ManagedPrompt` object with a `format()` method.
- Experiments propagate canonical_ids from dataset records to the corresponding experiments span when present (after calling `pull_dataset`).
- `LLMObs.create_dataset` supports a `bulk_upload` parameter to control data uploading behavior.
- `LLMObs.create_dataset` and `LLMObs.create_dataset_from_csv` support users specifying the `deduplicate` parameter.
- Subset of dataset records can now be pulled with tags by using the `tags` argument to `LLMObs.pull_dataset`.
🐛 Bug Fixes
- Fix data duplication issue when uploading > 5MB datasets via `LLMObs.create_dataset`.
- Fix TypeError while processing failed AI Guard responses, leading to overriding the original error.
- Fixes an `AttributeError` on `openai-agents >= 0.8.0` caused by the removal of `AgentRunner._run_single_turn`.
- A bug which could prevent Profiling from being enabled when the library is installed through Single Step Instrumentation was fixed.
- Fixes an issue where the profiler was patching the `gevent` module unnecessarily even when the profiler was not enabled.
- A bug that would cause certain function names to be displayed as `<module>` in flame graphs has been fixed.
- Fix lock contention in the profiler's greenlet stack sampler that could cause connection pool exhaustion in gevent-based applications (e.g. gunicorn + gevent + psycopg2).
- Fixes an issue where the lock profiler's wrapper class did not support PEP 604 type union syntax (e.g., `asyncio.Condition | None`), causing a `TypeError` at import time for libraries using union type annotations at class definition time.
- Add `kafka_cluster_id` tag to Kafka offset/backlog tracking for confluent-kafka.
- Fixes a memory corruption issue where concurrent calls to the WAF on the same request context from multiple threads could cause crashes inside `libddwaf`; a per-context lock now serializes WAF calls on the same context.
- Avoid pickling wrappers in `ddtrace.internal.wrapping.context.BaseWrappingContext`.
- Fixed an incompatibility with `pytest-html` and other third-party reporting plugins caused by the ddtrace pytest plugin using a non-standard `dd_retry` test outcome for retry attempts; the outcome is now set to `rerun`.
- Fixes a `RuntimeError: generator didn't yield` in the Symbol DB remote config subscriber when the process has no writable temporary directory.
- Propagate distributed tracing headers for tasks that are not registered locally so traces link correctly across workers.
- Fix for a potential race condition affecting internal periodic worker threads that could have caused a `RuntimeError` during forks.
- Add a timeout to Unix socket connections to prevent thread I/O blocking.
Affected Symbols
Span.parent_idLLMObs.publish_evaluatorLLMObs.experimentexperiment.runLLMObs.get_promptManagedPrompt.formatLLMObs.annotation_contextLLMObs.to_annotation_dictLLMObs.create_datasetLLMObs.create_dataset_from_csvLLMObs.pull_datasetddtrace.internal.wrapping.context.BaseWrappingContextAgentRunner._run_single_turn
⚡ Deprecations
- The type annotation for `Span.parent_id` will change from `Optional[int]` to `int` in v5.0.0.