v1.82.1-focus-dev
📦 litellmView on GitHub →
🐛 14 fixes🔧 8 symbols
Summary
This release focuses primarily on bug fixes across various components, including routing, caching, provider integrations (Fireworks, Sagemaker, Vertex AI, Bedrock), and the Responses API streaming bridge. Several provider-specific URL and request body issues were resolved.
🐛 Bug Fixes
- Fixed handling of ResponseApplyPatchToolCall in the completion bridge.
- Router now breaks the retry loop on non-retryable errors.
- Fixed invalid OpenAPI schema for /spend/calculate and /credentials endpoints in the proxy.
- Preserved usage/cached_tokens in Responses API streaming bridge.
- Injected default_in_memory_ttl in DualCache async_set_cache and async_set_cache_pipeline.
- Applied server root path to mapped passthrough route matching.
- Merged parallel function_call items into a single assistant message in Responses API.
- Handled month overflow in duration_in_seconds calculation for multi-month durations.
- Used correct divisor when averaging TTFT (Time To First Token) in lowest-latency routing.
- Stripped duplicate /v1 from the models endpoint URL for Fireworks provider.
- Added role assumption support for Sagemaker embedding endpoints.
- Stripped LiteLLM-internal keys from extra_body before merging to Gemini request for Vertex AI.
- Preserved reasoning_effort summary field for Responses API when using OpenAI provider.
- Populated completion_tokens_details in Responses API for Bedrock provider.