b8860
📦 llama-cppView on GitHub →
🐛 2 fixes
Summary
This release addresses a critical bug causing delayed AllReduce operations specifically for Gemma-4 MoE models and improves tensor parallelism graph traversal logic.
Migration Steps
- Check for all sources before skipping nodes (internal logic change).
🐛 Bug Fixes
- Fixed delayed AllReduce operation on Gemma-4 MoE models.
- Implemented logic to skip forward past nodes that do not consume the current node, allowing a chain of MUL operations.