v4.0.0

Breaking Changes

📅 Mar 14, 2026📦 localaiView on GitHub →

⚠ 2 breaking✨ 23 features🐛 10 fixes🔧 22 symbols

Summary

LocalAI 4.0.0 transforms the platform into a complete AI orchestration system, introducing native agentic capabilities, a completely revamped React UI, and expanded MCP support. This major release also adds experimental MLX Distributed support and several new audio backends.

⚠️ Breaking Changes

HuggingFace backend support has been removed. Users relying on this backend must migrate to other supported backends.
AIO images have been dropped. Users should switch to the main LocalAI images.

Migration Steps

If you were using the HuggingFace backend, you must migrate to a supported alternative.
If you were using AIO images, switch to the main LocalAI images.
If you need to separate persistent data (agents, skills) from configuration, use the new `--data-path` CLI flag or set the `LOCALAI_DATA_PATH` environment variable.
If you relied on the old `json_verbose` parameter, rename it to `verbose_json` for OpenAI spec compliance.

✨ New Features

Introduction of native agentic orchestration capabilities embedded in the core, including agent management via the new UI.
Launch of Agenthub, a community hub for sharing and importing agents.
Full lifecycle management for Agents via the React UI, supporting Slack integration, MCP server configuration, and skills.
Centralized skill database for AI agents.
Agent memory support using Hybrid search (PostgreSQL) or in-memory storage (Chromem).
New 'Events' column in the Agents list for observability and status tracking.
Complete frontend rewrite migrated to React, offering a modern UX and faster performance.
Introduction of 'Canvas Mode' in chat to preview code blocks and artifacts side-by-side.
New 'System View' tabbed navigation separating Models and Backends.
Visual warnings when model storage exceeds system RAM.
Improved trace display using accordions.
Expanded Model Context Protocol (MCP) support, including MCP Apps selection in the UI.
Automatic injection of tools from MCP servers into the standard chat interface (Tool Streaming).
Full client-side integration for MCP tools and streaming.
Option to disable MCP support entirely via the `LOCALAI_DISABLE_MCP` environment variable.
Experimental backend for distributed workloads using Apple's MLX framework (MLX Distributed).
New audio backends introduced: fish-speech, ace-step.cpp, and faster-qwen3-tts (CUDA-only).
WebRTC support added to the Realtime API and Talk page for low-latency audio.
Added `sample_rate` support via post-processing and multi-voice support for Qwen TTS.
Fixed model selection dropdown sync and added `vllm-omni` backend detection for Video Generation.
New `--data-path` CLI flag and `LOCALAI_DATA_PATH` environment variable to separate persistent data (agents, skills) from configuration.
Dynamic completion scripts generated for bash, zsh, and fish shells.
Dedicated documentation provided for Podman installation and rootless configuration.

🐛 Bug Fixes

Fixed watchdog constantly running and spamming logs when no interval was configured; health check logs downgraded to debug.
Renamed `json_verbose` to `verbose_json` for OpenAI specification compliance (fixes Nextcloud integration).
Fixed embedding dimension truncation to return full native dimensions.
Changed model install file permissions from 0600 to 0644 to ensure server readability.
Added named volumes to Docker Compose files for Windows compatibility.
Models now reload automatically after editing YAML configuration (e.g., context_size).
Fixed issue where thinking/reasoning blocks were incorrectly sent to the LLM during chat.
Fixed img2img pipeline in the diffusers backend.
Fixed Qwen TTS duplicate argument error.
Improved GPU vendor checks to prevent false CUDA detection on CPU-only hosts with runtime libraries.

Summary

⚠️ Breaking Changes

Migration Steps

✨ New Features

🐛 Bug Fixes

Affected Symbols