v3.10.0
📦 localaiView on GitHub →
✨ 11 features🐛 2 fixes🔧 4 symbols
Summary
LocalAI 3.10.0 introduces major enhancements to agent capabilities with full Open Responses API and Anthropic API support, alongside a unified GPU backend system for simplified cross-platform acceleration. This release also adds new features like video generation and faster transcription via the Moonshine backend.
Migration Steps
- To maintain session state across calls when using Open Responses API, set `response_id` in your request.
- To run agents asynchronously via Open Responses API, use `background: true` and retrieve results via `GET /api/v1/responses/{response_id}`.
- To enable streaming via Open Responses API, use `stream: true` in the request.
- If using Anthropic API emulation, use `https://api.localai.host/v1/messages`.
- If using the new video generation features, access the UI at `/video`.
- If debugging agents or fine-tuning, enable Request Tracing via runtime setting or API and fetch logs via `GET /api/v1/trace`.
✨ New Features
- Added native Anthropic Messages API support, compatible with Claude's /v1/messages endpoint.
- Introduced Open Responses API compatibility for stateful agents, supporting tool calling, streaming, background mode, and multi-turn conversations.
- Launched a new Video Generation UI supporting text-to-video and image-to-video workflows using LTX-2.
- Implemented a unified GPU backend system where GPU libraries (CUDA, ROCm, Vulkan) are packaged inside backend containers, enabling out-of-the-box GPU acceleration on Nvidia, AMD, and ARM64 (Experimental).
- Full support for streaming tool calls and parsing of XML-formatted tool outputs.
- The backend gallery now shows only backends compatible with the host system's capabilities (System-Aware Backend Gallery).
- Added Pocket-TTS, a lightweight, high-fidelity text-to-speech engine with voice cloning support.
- Implemented Request Tracing for memory-based logging of requests and responses to aid in debugging and fine-tuning.
- Added the Moonshine backend, an ONNX-based transcription engine optimized for low-end devices.
- Automatic detection and extraction of model thinking steps/reasoning tags, displayed separately in the chat UI.
- Enabled Vulkan arm64 builds for GPU acceleration on ARM64.
🐛 Bug Fixes
- Fixed crashes on AVX-only CPUs (Intel Sandy/Ivy Bridge) by safely falling back to SSE2.
- Fixed incorrect VRAM reporting on AMD GPUs by correctly parsing used and total VRAM from rocm-smi output.