LocalAI
AI & LLMs:robot: The free, Open Source alternative to OpenAI, Claude and others. Self-hosted and local-first. Drop-in replacement for OpenAI, running on consumer-grade hardware. No GPU required. Runs gguf, transformers, diffusers and many more. Features: Generate Text, MCP, Audio, Video, Images, Voice Cloning, Distributed, P2P and decentralized inference
Release History
v3.9.0Breaking2 fixes7 featuresLocalAI 3.9.0 introduces significant stability and resource management features, including an Agent Jobs panel for scheduling and an automatic Smart Memory Reclaimer with LRU model eviction. This release also drops x86_64 Mac support and updates data storage paths.
v3.8.06 fixes8 featuresLocalAI 3.8.0 introduces a major focus on user experience with a universal model importer, a complete UI overhaul, and hot-reloadable system settings. It also significantly improves agentic workflows via live streaming and fixes critical OpenAI SSE compatibility issues.
v3.7.07 fixes10 featuresLocalAI 3.7.0 introduces robust Agentic MCP support integrated into the WebUI and a new high-quality neutts TTS backend. This release also features a major WebUI overhaul and numerous stability fixes, particularly around tool usage and JSON schema handling.
v3.6.01 fix2 featuresThis release introduces support for l4t devices and multilingual capabilities in chatterbox, alongside a fix for token limits in llama.cpp reranking models. Numerous dependency updates, particularly for llama.cpp and whisper.cpp, were also performed.
v3.5.41 fixThis release primarily focuses on internal dependency updates for whisper.cpp and llama.cpp, along with a bug fix to standardize option checking across backends.
v3.5.31 fix3 featuresThis release introduces several new models to the gallery and fixes a bug related to float detection in diffusers. It also updates the underlying llama.cpp dependency.
v3.5.2This release reverts a dependency bump for CUDA images and updates the underlying llama.cpp commit, alongside documentation version updates.
v3.5.17 fixes4 featuresThis release introduces a new launcher welcome page and support for the HF_ENDPOINT environment variable. It also includes numerous bug fixes, dependency updates, and adds support for diarization in Whisper models.
v3.5.05 fixes14 featuresLocalAI 3.5.0 expands backend support significantly with MLX, Purego rewrites for Whisper and Stablediffusion, and introduces an Alpha Launcher for easier management. This release also brings numerous WebUI enhancements and stability fixes across various hardware platforms.
v3.4.01 fix7 featuresLocalAI 3.4.0 introduces significant expansion with new TTS and AI backends (KittenTTS, kokoro, dia), enhanced WebUI image generation controls, and support for reasoning effort in chat completions. It also improves backend installation flexibility and fixes a llama.cpp rope defaulting issue.
v3.3.21 fix2 featuresThis release introduces new flexibility for backend installation and configuration, alongside dependency updates and documentation improvements.
v3.3.1Breaking1 fix3 featuresThis minor release introduces support for Flux Kontext and Flux krea for image editing and generation. It also resolves critical bugs related to Intel GPU images and renames the corresponding container image tags.
v3.3.02 fixes3 featuresLocalAI 3.3.0 introduces object detection capabilities via a new API and improves backend download reliability with defined mirrors. This release also includes various bug fixes across container images, backends, and installation scripts.
v3.2.31 fixThis release primarily focuses on bug fixes, specifically addressing CUDA image tag naming, and includes an update to the ggml-org/llama.cpp dependency. Documentation for backend detection override was also added.
v3.2.23 fixes1 featureThis release introduces mirror support for the backend gallery and includes several bug fixes related to string trimming, Vulkan image suffixes, and CI image capabilities.
v3.2.12 fixesThis patch release primarily focuses on bug fixes related to installation and backend service communication, alongside updating several underlying C++ dependencies.
v3.2.0Breaking1 fix8 featuresLocalAI 3.2.0 introduces a major architectural shift by separating inference backends from the core binary, resulting in a leaner application and enabling independent backend management. This release also adds automatic hardware detection for backend installation and expands model support significantly.
v3.1.14 fixes1 featureThis patch release introduces automatic backend installation for models in the gallery and resolves several bugs related to GPU vendor identification and URI handling in the backends gallery.
v3.1.0Breaking1 fix3 featuresLocalAI 3.1 introduces support for Gemma 3n models and streamlines the container image structure by removing bundled sources, significantly reducing image size. This release also features meta-packages for easier backend installation and highlights the integrated LocalAI ecosystem.
v3.0.0Breaking3 fixes9 featuresLocalAI 3.0 introduces a major overhaul with the Backend Gallery for dynamic OCI-based backend management, adds WebSocket streaming, and enhances model capabilities with dynamic VRAM handling and multimodal support. This release marks a significant step towards a more modular and powerful local inference platform.
v2.29.0Breaking6 featuresLocalAI v2.29.0 introduces a major overhaul to container images, slimming down defaults and introducing an \`-extras\` suffix for optional dependencies, alongside new model support and experimental video generation capabilities.
v2.28.02 fixes5 featuresLocalAI v2.28.0 introduces SYCL support for image generation, updates the diffusers library, and features significant enhancements around the newly reborn LocalAGI agent framework, which is now written in Go.
v2.27.07 fixes11 featuresLocalAI v2.27.0 introduces a major WebUI overhaul, significant performance enhancements for GGUF and VLLM, and updates the default models shipped in AIO container images.
v2.26.0Breaking5 fixes9 featuresLocalAI v2.26.0 introduces significant backend consolidation, dropping deprecated TTS and LLM backends in favor of modern alternatives like GGUF and new TTS options. This release also adds support for Nvidia L4T devices and introduces lazy grammar features for llama.cpp.
v2.25.02 fixes6 featuresThis release introduces several new features, including streaming token usage, GGUF template reading, and enhanced llama.cpp cache configuration. It also includes numerous updates to the model gallery and a fix for the llava clip patch.