v4.1.0
📦 localaiView on GitHub →
✨ 11 features🐛 15 fixes🔧 8 symbols
Summary
LocalAI 4.1.0 transforms the platform into a production-grade system by introducing distributed clustering, robust user authentication, quotas, and experimental features like in-UI fine-tuning and quantization. This release focuses heavily on scalability, security, and advanced operational control.
Migration Steps
- If using SYCL backends, note that mmap is now auto-disabled to prevent crashes.
✨ New Features
- Introduced Distributed Mode allowing LocalAI to run as a cluster with smart routing based on VRAM, node groups for workload isolation, built-in min/max autoscaling, and cluster status dashboard.
- Implemented built-in multi-user platform features including User Management, OIDC/OAuth support for SSO, Invite Mode, per-user API key management, admin impersonation, and a Quota System with usage analytics.
- Added experimental Fine-Tuning capabilities using Hugging Face TRL for training LoRA adapters, with auto-export to GGUF and import back into LocalAI.
- Added experimental Quantization Backend for on-the-fly model variant optimization.
- Introduced a visual Model Pipeline Editor in the React UI.
- Enabled running agents standalone from the CLI using 'local-ai agent run'.
- Implemented Smart Inferencing by sourcing automatic inference parameters from Unsloth.
- Added Media History to Studio pages to browse past generated images and media.
- Added support for iterative fallback parser when native tool call parsing fails.
- Added first-class NVIDIA Jetson/Tegra platform detection.
- Downloader now rewrites HuggingFace URIs using HF_ENDPOINT for mirror setups.
🐛 Bug Fixes
- Fixed noisy terminals by collapsing repeated log lines.
- Fixed crashes on SYCL backends by auto-disabling mmap.
- Bundled libdl, librt, libpthread for improved llama.cpp cross-platform support.
- Fixed API returning non-404 errors for missing models.
- Fixed API returning unescaped model names.
- Unified inferencing paths and added automatic retry on transient errors.
- Implemented encoding_format=base64 support for the embeddings endpoint.
- Fixed Kokoro TTS phonemization model not downloading during installation.
- Fixed Opus codec backend selection alias in Realtime API development mode.
- Fixed exact tag matching for model gallery filters.
- Fixed required ORItemParam.Arguments field being omitted; ORItemParam.Summary is now always populated.
- Fixed tracing settings not loading from runtime_settings.json.
- Fixed UI watchdog field mapping, model list refresh on deletion, backend display in model config, and MCP button ordering.
- Fixed directory removal during download fallback attempts and improved retry logic.
- Fixed baseDir assignment to use ModelPath correctly.