v4.4.0rc1

📅 Jan 27, 2026📦 datadog-sdkView on GitHub →

✨ 3 features🐛 8 fixes🔧 5 symbols

Summary

This release introduces significant enhancements to LLM Observability with class-based evaluators and fixes several critical bugs across AAP, aws_lambda, exception replay, litellm integration, and profiling.

✨ New Features

Adds support for class-based evaluators in LLM Observability by allowing users to subclass the `BaseEvaluator` class.
Introduces `EvaluatorContext` to store evaluation context including dataset record and span information.
Supports class-based summary evaluators via `BaseSummaryEvaluator`, which receives a `SummaryEvaluatorContext` containing aggregated inputs, outputs, expected outputs, and per-row evaluation results.

🐛 Bug Fixes

Fixed an issue where agent-based samplers could interfere with Standalone App and API Protection by rejecting traces before the custom sampler was evaluated when using low sample rates.
Resolved an issue in aws_lambda where user-defined SIGALRM handlers were not restored after TimeoutChannel cleanup, causing custom timeout handlers to fail after the first invocation.
Fixed a gevent support issue in exception replay that caused an exception when determining if a frame belonged to user code for capturing.
Resolved an issue with litellm>=1.74.15 where wrapped router streaming responses caused an `AttributeError` when accessing `.handler`; the integration now handles wrapped and unwrapped responses gracefully.
Fixed a profiling bug where non-pushed samples could leak data to subsequent samples.
Fixed a profiling bug where `asyncio` task stacks contained duplicated frames when the task was on-CPU; stacks now show each frame once.
The stack Profiler now correctly resets thread, task, and greenlet information after a fork to prevent stale data from the parent process affecting child process profiling.
Resolved an issue in LLM Observability where the Pydantic AI integration failed to properly trace `StreamedRunResult.stream_responses()` (introduced in `pydantic-ai==0.8.1`), preventing agent spans from finishing.

Affected Symbols

BaseEvaluator EvaluatorContext BaseSummaryEvaluator SummaryEvaluatorContext StreamedRunResult.stream_responses()