v4.4.0rc2

📅 Jan 28, 2026📦 datadog-sdkView on GitHub →

✨ 4 features🐛 13 fixes🔧 8 symbols

Summary

This release introduces significant enhancements to LLM Observability with class-based evaluators and adds configuration for logging levels via `DD_TRACE_LOG_LEVEL`. Numerous bug fixes address issues across profiling, AWS Lambda handlers, gevent compatibility, and specific library integrations like litellm and pydantic-ai.

✨ New Features

Adds support for class-based evaluators in LLM Observability by allowing users to subclass `BaseEvaluator`.
Introduces `EvaluatorContext` to store evaluation context including dataset record and span information.
Supports class-based summary evaluators via `BaseSummaryEvaluator`, which receives a `SummaryEvaluatorContext`.
Adds a new environment variable `DD_TRACE_LOG_LEVEL` to control the ddtrace logger level.

🐛 Bug Fixes

Fixes an issue where agent-based samplers could interfere with Standalone App and API Protection by rejecting traces prematurely.
Resolves an issue in aws_lambda where user-defined SIGALRM handlers were not restored after TimeoutChannel cleanup, causing custom timeout handlers to fail after the first invocation.
Fixes a gevent support issue in exception replay that caused an exception when determining if a frame belongs to user code for capturing.
Resolves an issue with litellm>=1.74.15 where wrapped router streaming responses caused an `AttributeError` when accessing `.handler`; integration now handles wrapped and original responses gracefully.
Fixes a profiling bug where non-pushed samples could leak data to subsequent samples.
Fixes a profiling bug where `asyncio` task stacks contained duplicated frames when the task was on-CPU; stacks now show each frame once.
The stack Profiler now correctly resets thread, task, and greenlet information after a fork, preventing stale data from the parent process.
Fixed crash in lock profiler when stack traces are too shallow (less than 4 frames), resulting in location "unknown:0" instead of a crash.
Fixed an issue that caused greenlets to misbehave when `gevent.joinall` is called.
Resolves a crash occurring when forking while using the memory profiler.
Fixes an issue where the Pydantic AI integration did not properly trace `StreamedRunResult.stream_responses()` (introduced in `pydantic-ai==0.8.1`), preventing agent spans from finishing.
Addresses an issue where the evaluators argument type for `LLMObs.experiment` was overly constrained; it now uses the covariant Sequence type.
Fixes an issue where OpenAI spans showed <span class="title-ref">model_name: "None"</span> instead of falling back to the request model when the API response returned a None model field; model name now falls back correctly.

Affected Symbols

BaseEvaluator EvaluatorContext BaseSummaryEvaluator SummaryEvaluatorContext DD_TRACE_LOG_LEVEL StreamedRunResult.stream_responses()pydantic-ai==0.8.1 LLMObs.experiment