Change8

b9434

📦 llama-cppView on GitHub →
🐛 2 fixes

Summary

This release focuses on fixing granularity issues for Qwen models under specific Tensor Parallelism configurations, particularly involving 3 GPUs, and resolves an afmoe TP bug.

🐛 Bug Fixes

  • Fixed granularity issues for Qwen 3.5/3.6 when using 3 GPUs.
  • Fixed an issue related to afmoe TP (Tensor Parallelism).