Change8

b8859

📦 llama-cppView on GitHub →
🐛 6 fixes🔧 5 symbols

Summary

This release focuses on stability and correctness within Tensor Parallelism, fixing issues related to 0-sized tensor slices and implementing an AllReduce fallback. Several platform-specific fixes were also applied, including updates to CUDA device handling and context size limits.

🐛 Bug Fixes

  • Fixed 0-sized tensor slices in TP (Tensor Parallelism).
  • Implemented AllReduce fallback mechanism.
  • Fixed aliasing between layer structure and GPU count.
  • Added missing std::fill implementation.
  • Fixed CUDA device setting.
  • Fixed maximum ggml context size.

Affected Symbols