b8859

📅 Apr 20, 2026📦 llama-cppView on GitHub →

🐛 6 fixes🔧 5 symbols

Summary

This release focuses on stability and correctness within Tensor Parallelism, fixing issues related to 0-sized tensor slices and implementing an AllReduce fallback. Several platform-specific fixes were also applied, including updates to CUDA device handling and context size limits.

🐛 Bug Fixes

Fixed 0-sized tensor slices in TP (Tensor Parallelism).
Implemented AllReduce fallback mechanism.
Fixed aliasing between layer structure and GPU count.
Added missing std::fill implementation.
Fixed CUDA device setting.
Fixed maximum ggml context size.

Affected Symbols

TP AllReduce std::fill CUDA device set ggml ctx size