Change8

v4.2.0

📦 localaiView on GitHub →
17 features🐛 2 fixes🔧 10 symbols

Summary

LocalAI 4.2.0 introduces major multimodal capabilities, including voice recognition, face biometrics with liveness detection, and video generation. This release also brings Ollama API compatibility, a fully redesigned, brandable UI, and significant hardening to the vLLM and Distributed Mode backends.

Migration Steps

  1. If using Whisper, note that client cancellation is now supported via the ggml abort_callback.
  2. If using Distributed Mode, be aware of hardening changes in the orchestrator resilience.

✨ New Features

  • New /v1/voice/* endpoints for speaker verification, identification, embedding, and analysis.
  • New /v1/audio/diarization endpoint for speaker turn segmentation using sherpa-onnx + vibevoice.cpp.
  • Face recognition pipeline including 1:1 verification, 1:N identification, detection, analysis (age, gender, emotion, race), embeddings, and antispoofing (liveness detection) powered by InsightFace + ONNX.
  • Word-level timestamps support for faster-whisper.
  • Client-cancellable Whisper transcriptions via the ggml abort_callback.
  • Stream-done metadata on /v1/audio/transcriptions including segments, duration, and language.
  • Drop-in Ollama API compatibility, allowing Ollama clients to target LocalAI.
  • Video generation support in the stable-diffusion.ggml backend (i2v, first-last-frame).
  • Redesigned Chat UI with cleaner layout, Nord palette, and dark-mode first approach.
  • Internationalization (i18n) support in the UI (5 languages: English, Italiano, Español, Deutsch, 简体中文).
  • Admin-configurable branding for the instance (name, tagline, logo, favicon).
  • Interactive model configuration editor in the UI with autocomplete and live validation.
  • Universal importer supporting imports across most backends, including dedicated importers for vibevoice-cpp and whisper.cpp HF repos.
  • Concurrency groups implemented for per-model exclusive loading to prevent resource contention.
  • vLLM backend achieves feature parity with llama.cpp and supports tensor-parallel distributed workers.
  • Full exposure of vLLM engine_args via a generic YAML map.
  • Hardened Distributed Mode v2 orchestrator resilience and improved worker binding/upgrading logic.

🐛 Bug Fixes

  • Transcription errors now surface in the access log and to the client.
  • Fixes for Distributed Mode v2 orchestrator resilience, including auto-upgrade routing, worker bind-wait, RAG-init crash handling, and log-spam reduction.

Affected Symbols