b9455
📦 llama-cppView on GitHub →
✨ 1 features🐛 2 fixes
Summary
This release introduces support for quantized KV cache in TP operations and includes minor fixes for partial views and assertions.
✨ New Features
- Added support for quantized KV cache in Tensor Parallelism (TP) operations.
🐛 Bug Fixes
- Fixed an issue related to partial views.
- Removed an overly strict assertion.