v4.1.0

📅 Apr 2, 2026📦 localaiView on GitHub →

✨ 11 features🐛 15 fixes🔧 8 symbols

Summary

LocalAI 4.1.0 transforms the platform into a production-grade system by introducing distributed clustering, robust user authentication, quotas, and experimental features like in-UI fine-tuning and quantization. This release focuses heavily on scalability, security, and advanced operational control.

Migration Steps

If using SYCL backends, note that mmap is now auto-disabled to prevent crashes.

✨ New Features

Introduced Distributed Mode allowing LocalAI to run as a cluster with smart routing based on VRAM, node groups for workload isolation, built-in min/max autoscaling, and cluster status dashboard.
Implemented built-in multi-user platform features including User Management, OIDC/OAuth support for SSO, Invite Mode, per-user API key management, admin impersonation, and a Quota System with usage analytics.
Added experimental Fine-Tuning capabilities using Hugging Face TRL for training LoRA adapters, with auto-export to GGUF and import back into LocalAI.
Added experimental Quantization Backend for on-the-fly model variant optimization.
Introduced a visual Model Pipeline Editor in the React UI.
Enabled running agents standalone from the CLI using 'local-ai agent run'.
Implemented Smart Inferencing by sourcing automatic inference parameters from Unsloth.
Added Media History to Studio pages to browse past generated images and media.
Added support for iterative fallback parser when native tool call parsing fails.
Added first-class NVIDIA Jetson/Tegra platform detection.
Downloader now rewrites HuggingFace URIs using HF_ENDPOINT for mirror setups.

🐛 Bug Fixes

Fixed noisy terminals by collapsing repeated log lines.
Fixed crashes on SYCL backends by auto-disabling mmap.
Bundled libdl, librt, libpthread for improved llama.cpp cross-platform support.
Fixed API returning non-404 errors for missing models.
Fixed API returning unescaped model names.
Unified inferencing paths and added automatic retry on transient errors.
Implemented encoding_format=base64 support for the embeddings endpoint.
Fixed Kokoro TTS phonemization model not downloading during installation.
Fixed Opus codec backend selection alias in Realtime API development mode.
Fixed exact tag matching for model gallery filters.
Fixed required ORItemParam.Arguments field being omitted; ORItemParam.Summary is now always populated.
Fixed tracing settings not loading from runtime_settings.json.
Fixed UI watchdog field mapping, model list refresh on deletion, backend display in model config, and MCP button ordering.
Fixed directory removal during download fallback attempts and improved retry logic.
Fixed baseDir assignment to use ModelPath correctly.

Affected Symbols

local-ai agent run embeddings endpoint (encoding_format=base64)Kokoro TTS Realtime API (Opus codec)ORItemParam.Arguments ORItemParam.Summary runtime_settings.json HF_ENDPOINT