v3.8.0
📦 localaiView on GitHub →
✨ 8 features🐛 6 fixes🔧 4 symbols
Summary
LocalAI 3.8.0 introduces a major focus on user experience with a universal model importer, a complete UI overhaul, and hot-reloadable system settings. It also significantly improves agentic workflows via live streaming and fixes critical OpenAI SSE compatibility issues.
Migration Steps
- If you rely on runtime settings that previously required environment variables or restarts (e.g., watchdogs, P2P settings), ensure you mount the `/configuration` directory from the container image to persist these new runtime settings.
- If you were using custom SSE streaming clients that relied on previous non-compliant behavior, you may need to update them to handle the strict OpenAI standard implemented in this release.
✨ New Features
- Introduced Universal Model Import supporting direct URLs from Hugging Face, Ollama, and OCI registries, or local paths, with auto-detection of backends and chat templates.
- Complete UI Overhaul including a new onboarding wizard, auto-model selection on boot, and a cleaner tabular view for model management.
- New Model Context Protocol (MCP) Live Streaming for agent actions and tool calls, allowing real-time viewing of agent reasoning.
- Hot-Reloadable Settings panel allowing modification of watchdogs, API keys, P2P settings, and defaults without restarting the container (Note: Network settings like CORS/CSRF still require a restart).
- Chat history and parallel conversations are now persisted in local browser storage.
- Strict adherence to OpenAI SSE streaming standards, resolving compatibility issues with clients like LangChain/JS.
- Exposed advanced configuration options for llama.cpp backends via YAML, including `context_shift`, `cache_ram`, and `parallel` workers.
- Added full support for `logitbias` and `logprobs` to match OpenAI specification, useful for agentic workflows and evaluation.
🐛 Bug Fixes
- Fixed SSE streaming format to exactly match OpenAI specifications, resolving issues with LangChain/JS clients.
- In the reranker, `top_n` can now be omitted or set to `0` to return all results instead of defaulting to an arbitrary limit.
- Fixed model preview when downloading to show the actual filename and size before committing.
- Fixed crashes that occurred when tool content was missing or malformed.
- Fixed dropdown selection states for TTS models.
- Implemented true cancellation functionality when clicking "Stop" during streaming, correctly stopping generation across llama.cpp, vLLM, transformers, and diffusers backends.
🔧 Affected Symbols
llama.cpp (configuration options exposed)vLLM (cancellation context)transformers (cancellation context)diffusers (cancellation context)