Change8

b9455

📦 llama-cppView on GitHub →
1 features🐛 2 fixes

Summary

This release introduces support for quantized KV cache in TP operations and includes minor fixes for partial views and assertions.

✨ New Features

  • Added support for quantized KV cache in Tensor Parallelism (TP) operations.

🐛 Bug Fixes

  • Fixed an issue related to partial views.
  • Removed an overly strict assertion.