Change8

b7616

📦 llama-cppView on GitHub →
3 features🔧 3 symbols

Summary

This release focuses on performance optimizations for the GGML_OP_CUMSUM operation within the Vulkan backend, introducing a multipass shader for large rows and multi-element processing per thread.

✨ New Features

  • Optimized GGML_OP_CUMSUM for Vulkan backends using a dual-path approach.
  • Implemented a multipass shader for GGML_OP_CUMSUM to handle small numbers of large rows more efficiently.
  • Enhanced the whole-row shader to handle multiple elements per invocation (2 ELEM_PER_THREAD for AMD/Intel).

🔧 Affected Symbols

GGML_OP_CUMSUMvulkanELEM_PER_THREAD