Change8

b9254

📦 llama-cppView on GitHub →
3 features🐛 4 fixes🔧 16 symbols

Summary

This release introduces Programmatic Dependent Launch (PDL) for significant performance improvements on Hopper+ NVIDIA GPUs by optimizing kernel execution overlap. Several fixes were implemented to correctly enable/disable PDL based on hardware architecture and environment settings.

Migration Steps

  1. To disable PDL, set the environment variable GGML_CUDA_ENABLE_PDL=0. If this environment variable is not set or set to 1, PDL will be enabled by default on Hopper+ devices.
  2. Exchanged GGML_CUDA_DISABLE_PDL with GGML_CUDA_PDL. PDL is disabled if and only if GGML_CUDA_PDL=0.

✨ New Features

  • Initial implementation of Programmatic Dependent Launch (PDL) for performance gains on newer NVIDIA GPUs (Hopper+).
  • Added GGML_CUDA_PDL command line option to toggle PDL functionality.
  • PDL is now enabled by default for Hopper+ devices.

🐛 Bug Fixes

  • Fixed a needless and broken check of CUDA arch for PDL; PDL either works or has no effect.
  • Fixed PDL by inlining ggml_cuda_kernel_launch and using perfect forwarding for kernel arguments.
  • Fixed PDL enablement/disablement based on device-side arch check, requiring a move from macros to inlined functions.
  • Fixed a performance regression on Ada GPUs by excluding Ada and lower architectures from PDL launches.

Affected Symbols