Change8

v2.27.0

📦 localaiView on GitHub →
11 features🐛 7 fixes🔧 4 symbols

Summary

LocalAI v2.27.0 introduces a major WebUI overhaul, significant performance enhancements for GGUF and VLLM, and updates the default models shipped in AIO container images.

Migration Steps

  1. If using AIO images, note the new default models included in v2.27.0.
  2. Review Docker run commands if you rely on specific image tags; 'latest' and 'latest-aio-cpu' have updated defaults.

✨ New Features

  • Complete WebUI Redesign with a fresh, modern interface and enhanced navigation.
  • Model Gallery Improvements including better pagination and filtering.
  • AIO Image Updates with new default models shipped (e.g., llama3.1, granite-embeddings, minicpm for CPU).
  • Chat Interface Enhancements: cleaner layout, model-specific UI tweaks, and custom reply prefixes.
  • Smart Model Detection: Automatically links to relevant model documentation based on use.
  • Performance Tweaks: GGUF models now auto-detect context size.
  • Llama.cpp now handles batch embeddings and SIGTERM gracefully.
  • VLLM Configuration Boost: Added options to disable logging, set dtype, and enforce per-prompt media limits.
  • Support for new model architectures: Gemma 3, Mistral, Deepseek.
  • Ability to specify a reply prefix for chat responses.
  • UI improvements to index and models page.

🐛 Bug Fixes

  • Resolved model icon display inconsistencies.
  • Ensured proper handling of generated artifacts without API key restrictions.
  • Optimized CLIP offloading (no longer implies GPU offload by default).
  • Fixed Llama.cpp process termination handling (SIGTERM).
  • Fixed initialization order of llama-cpp-avx512 to precede avx2 variant.
  • Pinned transformers version for coqui fix.
  • Unified usecases identifications for models.

🔧 Affected Symbols

llama.cppCLIPcoquitransformers