b7616
📦 llama-cppView on GitHub →
✨ 3 features🔧 3 symbols
Summary
This release focuses on performance optimizations for the GGML_OP_CUMSUM operation within the Vulkan backend, introducing a multipass shader for large rows and multi-element processing per thread.
✨ New Features
- Optimized GGML_OP_CUMSUM for Vulkan backends using a dual-path approach.
- Implemented a multipass shader for GGML_OP_CUMSUM to handle small numbers of large rows more efficiently.
- Enhanced the whole-row shader to handle multiple elements per invocation (2 ELEM_PER_THREAD for AMD/Intel).
🔧 Affected Symbols
GGML_OP_CUMSUMvulkanELEM_PER_THREAD