Change8

v4.0.0

Breaking Changes
📦 localaiView on GitHub →
2 breaking23 features🐛 10 fixes🔧 22 symbols

Summary

LocalAI 4.0.0 transforms the platform into a complete AI orchestration system, introducing native agentic capabilities, a completely revamped React UI, and expanded MCP support. This major release also adds experimental MLX Distributed support and several new audio backends.

⚠️ Breaking Changes

  • HuggingFace backend support has been removed. Users relying on this backend must migrate to other supported backends.
  • AIO images have been dropped. Users should switch to the main LocalAI images.

Migration Steps

  1. If you were using the HuggingFace backend, you must migrate to a supported alternative.
  2. If you were using AIO images, switch to the main LocalAI images.
  3. If you need to separate persistent data (agents, skills) from configuration, use the new `--data-path` CLI flag or set the `LOCALAI_DATA_PATH` environment variable.
  4. If you relied on the old `json_verbose` parameter, rename it to `verbose_json` for OpenAI spec compliance.

✨ New Features

  • Introduction of native agentic orchestration capabilities embedded in the core, including agent management via the new UI.
  • Launch of Agenthub, a community hub for sharing and importing agents.
  • Full lifecycle management for Agents via the React UI, supporting Slack integration, MCP server configuration, and skills.
  • Centralized skill database for AI agents.
  • Agent memory support using Hybrid search (PostgreSQL) or in-memory storage (Chromem).
  • New 'Events' column in the Agents list for observability and status tracking.
  • Complete frontend rewrite migrated to React, offering a modern UX and faster performance.
  • Introduction of 'Canvas Mode' in chat to preview code blocks and artifacts side-by-side.
  • New 'System View' tabbed navigation separating Models and Backends.
  • Visual warnings when model storage exceeds system RAM.
  • Improved trace display using accordions.
  • Expanded Model Context Protocol (MCP) support, including MCP Apps selection in the UI.
  • Automatic injection of tools from MCP servers into the standard chat interface (Tool Streaming).
  • Full client-side integration for MCP tools and streaming.
  • Option to disable MCP support entirely via the `LOCALAI_DISABLE_MCP` environment variable.
  • Experimental backend for distributed workloads using Apple's MLX framework (MLX Distributed).
  • New audio backends introduced: fish-speech, ace-step.cpp, and faster-qwen3-tts (CUDA-only).
  • WebRTC support added to the Realtime API and Talk page for low-latency audio.
  • Added `sample_rate` support via post-processing and multi-voice support for Qwen TTS.
  • Fixed model selection dropdown sync and added `vllm-omni` backend detection for Video Generation.
  • New `--data-path` CLI flag and `LOCALAI_DATA_PATH` environment variable to separate persistent data (agents, skills) from configuration.
  • Dynamic completion scripts generated for bash, zsh, and fish shells.
  • Dedicated documentation provided for Podman installation and rootless configuration.

🐛 Bug Fixes

  • Fixed watchdog constantly running and spamming logs when no interval was configured; health check logs downgraded to debug.
  • Renamed `json_verbose` to `verbose_json` for OpenAI specification compliance (fixes Nextcloud integration).
  • Fixed embedding dimension truncation to return full native dimensions.
  • Changed model install file permissions from 0600 to 0644 to ensure server readability.
  • Added named volumes to Docker Compose files for Windows compatibility.
  • Models now reload automatically after editing YAML configuration (e.g., context_size).
  • Fixed issue where thinking/reasoning blocks were incorrectly sent to the LLM during chat.
  • Fixed img2img pipeline in the diffusers backend.
  • Fixed Qwen TTS duplicate argument error.
  • Improved GPU vendor checks to prevent false CUDA detection on CPU-only hosts with runtime libraries.

Affected Symbols