Change8

v3.10.0

📦 localaiView on GitHub →
11 features🐛 2 fixes🔧 4 symbols

Summary

LocalAI 3.10.0 introduces major enhancements to agent capabilities with full Open Responses API and Anthropic API support, alongside a unified GPU backend system for simplified cross-platform acceleration. This release also adds new features like video generation and faster transcription via the Moonshine backend.

Migration Steps

  1. To maintain session state across calls when using Open Responses API, set `response_id` in your request.
  2. To run agents asynchronously via Open Responses API, use `background: true` and retrieve results via `GET /api/v1/responses/{response_id}`.
  3. To enable streaming via Open Responses API, use `stream: true` in the request.
  4. If using Anthropic API emulation, use `https://api.localai.host/v1/messages`.
  5. If using the new video generation features, access the UI at `/video`.
  6. If debugging agents or fine-tuning, enable Request Tracing via runtime setting or API and fetch logs via `GET /api/v1/trace`.

✨ New Features

  • Added native Anthropic Messages API support, compatible with Claude's /v1/messages endpoint.
  • Introduced Open Responses API compatibility for stateful agents, supporting tool calling, streaming, background mode, and multi-turn conversations.
  • Launched a new Video Generation UI supporting text-to-video and image-to-video workflows using LTX-2.
  • Implemented a unified GPU backend system where GPU libraries (CUDA, ROCm, Vulkan) are packaged inside backend containers, enabling out-of-the-box GPU acceleration on Nvidia, AMD, and ARM64 (Experimental).
  • Full support for streaming tool calls and parsing of XML-formatted tool outputs.
  • The backend gallery now shows only backends compatible with the host system's capabilities (System-Aware Backend Gallery).
  • Added Pocket-TTS, a lightweight, high-fidelity text-to-speech engine with voice cloning support.
  • Implemented Request Tracing for memory-based logging of requests and responses to aid in debugging and fine-tuning.
  • Added the Moonshine backend, an ONNX-based transcription engine optimized for low-end devices.
  • Automatic detection and extraction of model thinking steps/reasoning tags, displayed separately in the chat UI.
  • Enabled Vulkan arm64 builds for GPU acceleration on ARM64.

🐛 Bug Fixes

  • Fixed crashes on AVX-only CPUs (Intel Sandy/Ivy Bridge) by safely falling back to SSE2.
  • Fixed incorrect VRAM reporting on AMD GPUs by correctly parsing used and total VRAM from rocm-smi output.

Affected Symbols