v3.8.0

📅 Nov 26, 2025📦 localaiView on GitHub →

✨ 8 features🐛 6 fixes🔧 4 symbols

Summary

LocalAI 3.8.0 introduces a major focus on user experience with a universal model importer, a complete UI overhaul, and hot-reloadable system settings. It also significantly improves agentic workflows via live streaming and fixes critical OpenAI SSE compatibility issues.

Migration Steps

If you rely on runtime settings that previously required environment variables or restarts (e.g., watchdogs, P2P settings), ensure you mount the `/configuration` directory from the container image to persist these new runtime settings.
If you were using custom SSE streaming clients that relied on previous non-compliant behavior, you may need to update them to handle the strict OpenAI standard implemented in this release.

✨ New Features

Introduced Universal Model Import supporting direct URLs from Hugging Face, Ollama, and OCI registries, or local paths, with auto-detection of backends and chat templates.
Complete UI Overhaul including a new onboarding wizard, auto-model selection on boot, and a cleaner tabular view for model management.
New Model Context Protocol (MCP) Live Streaming for agent actions and tool calls, allowing real-time viewing of agent reasoning.
Hot-Reloadable Settings panel allowing modification of watchdogs, API keys, P2P settings, and defaults without restarting the container (Note: Network settings like CORS/CSRF still require a restart).
Chat history and parallel conversations are now persisted in local browser storage.
Strict adherence to OpenAI SSE streaming standards, resolving compatibility issues with clients like LangChain/JS.
Exposed advanced configuration options for llama.cpp backends via YAML, including `context_shift`, `cache_ram`, and `parallel` workers.
Added full support for `logitbias` and `logprobs` to match OpenAI specification, useful for agentic workflows and evaluation.

🐛 Bug Fixes

Fixed SSE streaming format to exactly match OpenAI specifications, resolving issues with LangChain/JS clients.
In the reranker, `top_n` can now be omitted or set to `0` to return all results instead of defaulting to an arbitrary limit.
Fixed model preview when downloading to show the actual filename and size before committing.
Fixed crashes that occurred when tool content was missing or malformed.
Fixed dropdown selection states for TTS models.
Implemented true cancellation functionality when clicking "Stop" during streaming, correctly stopping generation across llama.cpp, vLLM, transformers, and diffusers backends.

🔧 Affected Symbols

llama.cpp (configuration options exposed)vLLM (cancellation context)transformers (cancellation context)diffusers (cancellation context)