v1.82.1-focus-dev

📅 Mar 12, 2026📦 litellmView on GitHub →

🐛 14 fixes🔧 8 symbols

Summary

This release focuses primarily on bug fixes across various components, including routing, caching, provider integrations (Fireworks, Sagemaker, Vertex AI, Bedrock), and the Responses API streaming bridge. Several provider-specific URL and request body issues were resolved.

🐛 Bug Fixes

Fixed handling of ResponseApplyPatchToolCall in the completion bridge.
Router now breaks the retry loop on non-retryable errors.
Fixed invalid OpenAPI schema for /spend/calculate and /credentials endpoints in the proxy.
Preserved usage/cached_tokens in Responses API streaming bridge.
Injected default_in_memory_ttl in DualCache async_set_cache and async_set_cache_pipeline.
Applied server root path to mapped passthrough route matching.
Merged parallel function_call items into a single assistant message in Responses API.
Handled month overflow in duration_in_seconds calculation for multi-month durations.
Used correct divisor when averaging TTFT (Time To First Token) in lowest-latency routing.
Stripped duplicate /v1 from the models endpoint URL for Fireworks provider.
Added role assumption support for Sagemaker embedding endpoints.
Stripped LiteLLM-internal keys from extra_body before merging to Gemini request for Vertex AI.
Preserved reasoning_effort summary field for Responses API when using OpenAI provider.
Populated completion_tokens_details in Responses API for Bedrock provider.

Affected Symbols

ResponseApplyPatchToolCall DualCache OpenAPI schema (/spend/calculate, /credentials)Fireworks models endpoint URL Sagemaker embedding endpoint Vertex AI Gemini request handling OpenAI Responses API handling Bedrock Responses API handling