b7616

📅 Jan 2, 2026📦 llama-cppView on GitHub →

✨ 3 features🔧 3 symbols

Summary

This release focuses on performance optimizations for the GGML_OP_CUMSUM operation within the Vulkan backend, introducing a multipass shader for large rows and multi-element processing per thread.

✨ New Features

Optimized GGML_OP_CUMSUM for Vulkan backends using a dual-path approach.
Implemented a multipass shader for GGML_OP_CUMSUM to handle small numbers of large rows more efficiently.
Enhanced the whole-row shader to handle multiple elements per invocation (2 ELEM_PER_THREAD for AMD/Intel).

🔧 Affected Symbols

GGML_OP_CUMSUMvulkanELEM_PER_THREAD