b9254
📦 llama-cppView on GitHub →
✨ 3 features🐛 4 fixes🔧 16 symbols
Summary
This release introduces Programmatic Dependent Launch (PDL) for significant performance improvements on Hopper+ NVIDIA GPUs by optimizing kernel execution overlap. Several fixes were implemented to correctly enable/disable PDL based on hardware architecture and environment settings.
Migration Steps
- To disable PDL, set the environment variable GGML_CUDA_ENABLE_PDL=0. If this environment variable is not set or set to 1, PDL will be enabled by default on Hopper+ devices.
- Exchanged GGML_CUDA_DISABLE_PDL with GGML_CUDA_PDL. PDL is disabled if and only if GGML_CUDA_PDL=0.
✨ New Features
- Initial implementation of Programmatic Dependent Launch (PDL) for performance gains on newer NVIDIA GPUs (Hopper+).
- Added GGML_CUDA_PDL command line option to toggle PDL functionality.
- PDL is now enabled by default for Hopper+ devices.
🐛 Bug Fixes
- Fixed a needless and broken check of CUDA arch for PDL; PDL either works or has no effect.
- Fixed PDL by inlining ggml_cuda_kernel_launch and using perfect forwarding for kernel arguments.
- Fixed PDL enablement/disablement based on device-side arch check, requiring a move from macros to inlined functions.
- Fixed a performance regression on Ada GPUs by excluding Ada and lower architectures from PDL launches.