v4.0.0
Breaking Changes📦 localaiView on GitHub →
⚠ 2 breaking✨ 23 features🐛 10 fixes🔧 22 symbols
Summary
LocalAI 4.0.0 transforms the platform into a complete AI orchestration system, introducing native agentic capabilities, a completely revamped React UI, and expanded MCP support. This major release also adds experimental MLX Distributed support and several new audio backends.
⚠️ Breaking Changes
- HuggingFace backend support has been removed. Users relying on this backend must migrate to other supported backends.
- AIO images have been dropped. Users should switch to the main LocalAI images.
Migration Steps
- If you were using the HuggingFace backend, you must migrate to a supported alternative.
- If you were using AIO images, switch to the main LocalAI images.
- If you need to separate persistent data (agents, skills) from configuration, use the new `--data-path` CLI flag or set the `LOCALAI_DATA_PATH` environment variable.
- If you relied on the old `json_verbose` parameter, rename it to `verbose_json` for OpenAI spec compliance.
✨ New Features
- Introduction of native agentic orchestration capabilities embedded in the core, including agent management via the new UI.
- Launch of Agenthub, a community hub for sharing and importing agents.
- Full lifecycle management for Agents via the React UI, supporting Slack integration, MCP server configuration, and skills.
- Centralized skill database for AI agents.
- Agent memory support using Hybrid search (PostgreSQL) or in-memory storage (Chromem).
- New 'Events' column in the Agents list for observability and status tracking.
- Complete frontend rewrite migrated to React, offering a modern UX and faster performance.
- Introduction of 'Canvas Mode' in chat to preview code blocks and artifacts side-by-side.
- New 'System View' tabbed navigation separating Models and Backends.
- Visual warnings when model storage exceeds system RAM.
- Improved trace display using accordions.
- Expanded Model Context Protocol (MCP) support, including MCP Apps selection in the UI.
- Automatic injection of tools from MCP servers into the standard chat interface (Tool Streaming).
- Full client-side integration for MCP tools and streaming.
- Option to disable MCP support entirely via the `LOCALAI_DISABLE_MCP` environment variable.
- Experimental backend for distributed workloads using Apple's MLX framework (MLX Distributed).
- New audio backends introduced: fish-speech, ace-step.cpp, and faster-qwen3-tts (CUDA-only).
- WebRTC support added to the Realtime API and Talk page for low-latency audio.
- Added `sample_rate` support via post-processing and multi-voice support for Qwen TTS.
- Fixed model selection dropdown sync and added `vllm-omni` backend detection for Video Generation.
- New `--data-path` CLI flag and `LOCALAI_DATA_PATH` environment variable to separate persistent data (agents, skills) from configuration.
- Dynamic completion scripts generated for bash, zsh, and fish shells.
- Dedicated documentation provided for Podman installation and rootless configuration.
🐛 Bug Fixes
- Fixed watchdog constantly running and spamming logs when no interval was configured; health check logs downgraded to debug.
- Renamed `json_verbose` to `verbose_json` for OpenAI specification compliance (fixes Nextcloud integration).
- Fixed embedding dimension truncation to return full native dimensions.
- Changed model install file permissions from 0600 to 0644 to ensure server readability.
- Added named volumes to Docker Compose files for Windows compatibility.
- Models now reload automatically after editing YAML configuration (e.g., context_size).
- Fixed issue where thinking/reasoning blocks were incorrectly sent to the LLM during chat.
- Fixed img2img pipeline in the diffusers backend.
- Fixed Qwen TTS duplicate argument error.
- Improved GPU vendor checks to prevent false CUDA detection on CPU-only hosts with runtime libraries.
Affected Symbols
HuggingFace backendAIO imageswatchdoghealth check logsjson_verboseverbose_jsonembedding dimension calculationmodel install file permissionsDocker Compose files (Windows)YAML config reloadingchat reasoning blocksdiffusers backend (img2img pipeline)Qwen TTSGPU vendor checksMLX Distributed backendfish-speech backendace-step.cpp backendfaster-qwen3-tts backendRealtime APITalk page (WebRTC)Qwen TTS (sample_rate)vllm-omni backend