b8859
📦 llama-cppView on GitHub →
🐛 6 fixes🔧 5 symbols
Summary
This release focuses on stability and correctness within Tensor Parallelism, fixing issues related to 0-sized tensor slices and implementing an AllReduce fallback. Several platform-specific fixes were also applied, including updates to CUDA device handling and context size limits.
🐛 Bug Fixes
- Fixed 0-sized tensor slices in TP (Tensor Parallelism).
- Implemented AllReduce fallback mechanism.
- Fixed aliasing between layer structure and GPU count.
- Added missing std::fill implementation.
- Fixed CUDA device setting.
- Fixed maximum ggml context size.