Change8

b8860

📦 llama-cppView on GitHub →
🐛 2 fixes

Summary

This release addresses a critical bug causing delayed AllReduce operations specifically for Gemma-4 MoE models and improves tensor parallelism graph traversal logic.

Migration Steps

  1. Check for all sources before skipping nodes (internal logic change).

🐛 Bug Fixes

  • Fixed delayed AllReduce operation on Gemma-4 MoE models.
  • Implemented logic to skip forward past nodes that do not consume the current node, allowing a chain of MUL operations.