v4.2.0
📦 localaiView on GitHub →
✨ 17 features🐛 2 fixes🔧 10 symbols
Summary
LocalAI 4.2.0 introduces major multimodal capabilities, including voice recognition, face biometrics with liveness detection, and video generation. This release also brings Ollama API compatibility, a fully redesigned, brandable UI, and significant hardening to the vLLM and Distributed Mode backends.
Migration Steps
- If using Whisper, note that client cancellation is now supported via the ggml abort_callback.
- If using Distributed Mode, be aware of hardening changes in the orchestrator resilience.
✨ New Features
- New /v1/voice/* endpoints for speaker verification, identification, embedding, and analysis.
- New /v1/audio/diarization endpoint for speaker turn segmentation using sherpa-onnx + vibevoice.cpp.
- Face recognition pipeline including 1:1 verification, 1:N identification, detection, analysis (age, gender, emotion, race), embeddings, and antispoofing (liveness detection) powered by InsightFace + ONNX.
- Word-level timestamps support for faster-whisper.
- Client-cancellable Whisper transcriptions via the ggml abort_callback.
- Stream-done metadata on /v1/audio/transcriptions including segments, duration, and language.
- Drop-in Ollama API compatibility, allowing Ollama clients to target LocalAI.
- Video generation support in the stable-diffusion.ggml backend (i2v, first-last-frame).
- Redesigned Chat UI with cleaner layout, Nord palette, and dark-mode first approach.
- Internationalization (i18n) support in the UI (5 languages: English, Italiano, Español, Deutsch, 简体中文).
- Admin-configurable branding for the instance (name, tagline, logo, favicon).
- Interactive model configuration editor in the UI with autocomplete and live validation.
- Universal importer supporting imports across most backends, including dedicated importers for vibevoice-cpp and whisper.cpp HF repos.
- Concurrency groups implemented for per-model exclusive loading to prevent resource contention.
- vLLM backend achieves feature parity with llama.cpp and supports tensor-parallel distributed workers.
- Full exposure of vLLM engine_args via a generic YAML map.
- Hardened Distributed Mode v2 orchestrator resilience and improved worker binding/upgrading logic.
🐛 Bug Fixes
- Transcription errors now surface in the access log and to the client.
- Fixes for Distributed Mode v2 orchestrator resilience, including auto-upgrade routing, worker bind-wait, RAG-init crash handling, and log-spam reduction.