b9434
📦 llama-cppView on GitHub →
🐛 2 fixes
Summary
This release focuses on fixing granularity issues for Qwen models under specific Tensor Parallelism configurations, particularly involving 3 GPUs, and resolves an afmoe TP bug.
🐛 Bug Fixes
- Fixed granularity issues for Qwen 3.5/3.6 when using 3 GPUs.
- Fixed an issue related to afmoe TP (Tensor Parallelism).