Change8

v3.11.0

Breaking Changes
📦 localaiView on GitHub →
2 breaking10 features🐛 7 fixes🔧 12 symbols

Summary

LocalAI 3.11.0 is a massive update focused on Audio and Multimodal capabilities, introducing Realtime Audio Conversations and expanding ASR/TTS backends. This release also removes the unmaintained Bark and deprecated ExLlama backends.

⚠️ Breaking Changes

  • The ExLlama backend has been removed because it is deprecated in favor of newer loaders like ExLlamaV2 or llama.cpp.
  • The Bark backend has been removed because the upstream project is unmaintained; users should switch to the new TTS alternatives.

Migration Steps

  1. If you were using the ExLlama backend, migrate to ExLlamaV2 or llama.cpp.
  2. If you were using the Bark TTS backend, migrate to one of the new TTS alternatives (e.g., VoxCPM, Qwen-TTS, Piper).

✨ New Features

  • Introduced native support for Realtime Audio Conversations, enabling fluid, low-latency voice interaction compatible with standard client implementations.
  • Added a dedicated Web UI interface for music generation using the new Ace-Step (MusicGen) backend.
  • Expanded ASR capabilities with four new backends: WhisperX (with Speaker Diarization), VibeVoice, Qwen-ASR, and Nvidia NeMo.
  • Text-to-Speech (TTS) now supports streaming mode for lower latency responses (currently for VoxCPM only).
  • Added support for the vLLM Omni backend for high-performance inference.
  • Native support for Speaker Diarization (identifying different speakers) via the WhisperX backend.
  • Expanded build support for CUDA 12/13, L4T (Jetson), SBSA, and improved Metal (Apple Silicon) integration using MLX backends.
  • Added support for the VoxCPM TTS backend.
  • Added support for Qwen-TTS models.
  • Added most remaining Piper voices from Hugging Face to the gallery.

🐛 Bug Fixes

  • Fixed UI issue where the selected image model was not displayed correctly.
  • Fixed token count calculation to correctly account for reasoning in the UI.
  • Dropped redundant GGUF VRAM estimation logic, relying on more accurate internal measurements.
  • Fixed missing field in the initial OpenAI streaming response.
  • Fixed realtime audio handling to include the noAction function in the prompt template and correctly handle tool_choice.
  • Fixed filtering of GGUF and GGML files from the model list.
  • Fixed Makefile issue by removing contagious slop (DEFAULT_GOAL) related to qwen-asr.

Affected Symbols