v4.6.0rc2
📦 datadog-sdkView on GitHub →
✨ 4 features🐛 5 fixes🔧 2 symbols
Summary
This release introduces significant enhancements to LLM Observability, including DeepEval integration and experiment summary logging, alongside critical bug fixes for AAP memory corruption and CI Visibility reporting compatibility.
Migration Steps
- If using CI Visibility and third-party reporting plugins that rely on pytest outcomes, note that the retry outcome is now set to `rerun` instead of `dd_retry`.
✨ New Features
- AI Guard now includes SDS (Sensitive Data Scanner) findings in AI Guard spans for visibility into sensitive data in LLM inputs/outputs.
- LLM Observability Experiments now support DeepEval evaluations by allowing users to pass metrics inheriting from `BaseMetric` or `BaseConversationalMetric`.
- LLM Observability now logs an experiment summary after `experiment.run()` including row count, run count, per-evaluator stats, and error counts.
- Added `max_retries` and `retry_delay` parameters to `experiment.run()` for retrying failed tasks and evaluators.
🐛 Bug Fixes
- Fixed a memory corruption issue in AAP where concurrent WAF calls on the same request context from multiple threads could cause crashes (SIGSEGV) by introducing a per-context lock to serialize WAF calls.
- Avoided pickling wrappers in `ddtrace.internal.wrapping.context.BaseWrappingContext`.
- Fixed an incompatibility with `pytest-html` and other third-party reporting plugins in CI Visibility by changing the retry test outcome from `dd_retry` to the standard `rerun` value.
- Fixed a `RuntimeError: generator didn't yield` in the Symbol DB remote config subscriber when the process lacks a writable temporary directory.
- Fixed a profiling bug where certain function names appeared as `<module>` in flame graphs.