b8860

📅 Apr 20, 2026📦 llama-cppView on GitHub →

🐛 2 fixes

Summary

This release addresses a critical bug causing delayed AllReduce operations specifically for Gemma-4 MoE models and improves tensor parallelism graph traversal logic.

Migration Steps

Check for all sources before skipping nodes (internal logic change).

🐛 Bug Fixes

Fixed delayed AllReduce operation on Gemma-4 MoE models.
Implemented logic to skip forward past nodes that do not consume the current node, allowing a chain of MUL operations.