Change8

v4.1.0

📦 localaiView on GitHub →
11 features🐛 15 fixes🔧 8 symbols

Summary

LocalAI 4.1.0 transforms the platform into a production-grade system by introducing distributed clustering, robust user authentication, quotas, and experimental features like in-UI fine-tuning and quantization. This release focuses heavily on scalability, security, and advanced operational control.

Migration Steps

  1. If using SYCL backends, note that mmap is now auto-disabled to prevent crashes.

✨ New Features

  • Introduced Distributed Mode allowing LocalAI to run as a cluster with smart routing based on VRAM, node groups for workload isolation, built-in min/max autoscaling, and cluster status dashboard.
  • Implemented built-in multi-user platform features including User Management, OIDC/OAuth support for SSO, Invite Mode, per-user API key management, admin impersonation, and a Quota System with usage analytics.
  • Added experimental Fine-Tuning capabilities using Hugging Face TRL for training LoRA adapters, with auto-export to GGUF and import back into LocalAI.
  • Added experimental Quantization Backend for on-the-fly model variant optimization.
  • Introduced a visual Model Pipeline Editor in the React UI.
  • Enabled running agents standalone from the CLI using 'local-ai agent run'.
  • Implemented Smart Inferencing by sourcing automatic inference parameters from Unsloth.
  • Added Media History to Studio pages to browse past generated images and media.
  • Added support for iterative fallback parser when native tool call parsing fails.
  • Added first-class NVIDIA Jetson/Tegra platform detection.
  • Downloader now rewrites HuggingFace URIs using HF_ENDPOINT for mirror setups.

🐛 Bug Fixes

  • Fixed noisy terminals by collapsing repeated log lines.
  • Fixed crashes on SYCL backends by auto-disabling mmap.
  • Bundled libdl, librt, libpthread for improved llama.cpp cross-platform support.
  • Fixed API returning non-404 errors for missing models.
  • Fixed API returning unescaped model names.
  • Unified inferencing paths and added automatic retry on transient errors.
  • Implemented encoding_format=base64 support for the embeddings endpoint.
  • Fixed Kokoro TTS phonemization model not downloading during installation.
  • Fixed Opus codec backend selection alias in Realtime API development mode.
  • Fixed exact tag matching for model gallery filters.
  • Fixed required ORItemParam.Arguments field being omitted; ORItemParam.Summary is now always populated.
  • Fixed tracing settings not loading from runtime_settings.json.
  • Fixed UI watchdog field mapping, model list refresh on deletion, backend display in model config, and MCP button ordering.
  • Fixed directory removal during download fallback attempts and improved retry logic.
  • Fixed baseDir assignment to use ModelPath correctly.

Affected Symbols