v4.4.0rc1
📦 datadog-sdkView on GitHub →
✨ 3 features🐛 8 fixes🔧 5 symbols
Summary
This release introduces significant enhancements to LLM Observability with class-based evaluators and fixes several critical bugs across AAP, aws_lambda, exception replay, litellm integration, and profiling.
✨ New Features
- Adds support for class-based evaluators in LLM Observability by allowing users to subclass the `BaseEvaluator` class.
- Introduces `EvaluatorContext` to store evaluation context including dataset record and span information.
- Supports class-based summary evaluators via `BaseSummaryEvaluator`, which receives a `SummaryEvaluatorContext` containing aggregated inputs, outputs, expected outputs, and per-row evaluation results.
🐛 Bug Fixes
- Fixed an issue where agent-based samplers could interfere with Standalone App and API Protection by rejecting traces before the custom sampler was evaluated when using low sample rates.
- Resolved an issue in aws_lambda where user-defined SIGALRM handlers were not restored after TimeoutChannel cleanup, causing custom timeout handlers to fail after the first invocation.
- Fixed a gevent support issue in exception replay that caused an exception when determining if a frame belonged to user code for capturing.
- Resolved an issue with litellm>=1.74.15 where wrapped router streaming responses caused an `AttributeError` when accessing `.handler`; the integration now handles wrapped and unwrapped responses gracefully.
- Fixed a profiling bug where non-pushed samples could leak data to subsequent samples.
- Fixed a profiling bug where `asyncio` task stacks contained duplicated frames when the task was on-CPU; stacks now show each frame once.
- The stack Profiler now correctly resets thread, task, and greenlet information after a fork to prevent stale data from the parent process affecting child process profiling.
- Resolved an issue in LLM Observability where the Pydantic AI integration failed to properly trace `StreamedRunResult.stream_responses()` (introduced in `pydantic-ai==0.8.1`), preventing agent spans from finishing.