v2.25.0
📦 localaiView on GitHub →
✨ 6 features🐛 2 fixes🔧 4 symbols
Summary
This release introduces several new features, including streaming token usage, GGUF template reading, and enhanced llama.cpp cache configuration. It also includes numerous updates to the model gallery and a fix for the llava clip patch.
✨ New Features
- Exposed cache_type_k and cache_type_v for llama.cpp quantization of kv cache.
- Jinja templates can now be read from gguf files.
- Added support for streaming token usage.
- Dockerfile now allows skipping driver installation.
- Added path prefix support via HTTP header for the UI.
- Partial downloads can now be resumed by the downloader.
🐛 Bug Fixes
- Updated clip.patch for llava.
- Corrected gallery/index.yaml file.
🔧 Affected Symbols
llavallama.cppggufDockerfile