b9254

📅 May 20, 2026📦 llama-cppView on GitHub →

✨ 3 features🐛 4 fixes🔧 16 symbols

Summary

This release introduces Programmatic Dependent Launch (PDL) for significant performance improvements on Hopper+ NVIDIA GPUs by optimizing kernel execution overlap. Several fixes were implemented to correctly enable/disable PDL based on hardware architecture and environment settings.

Migration Steps

To disable PDL, set the environment variable GGML_CUDA_ENABLE_PDL=0. If this environment variable is not set or set to 1, PDL will be enabled by default on Hopper+ devices.
Exchanged GGML_CUDA_DISABLE_PDL with GGML_CUDA_PDL. PDL is disabled if and only if GGML_CUDA_PDL=0.

✨ New Features

Initial implementation of Programmatic Dependent Launch (PDL) for performance gains on newer NVIDIA GPUs (Hopper+).
Added GGML_CUDA_PDL command line option to toggle PDL functionality.
PDL is now enabled by default for Hopper+ devices.

🐛 Bug Fixes

Fixed a needless and broken check of CUDA arch for PDL; PDL either works or has no effect.
Fixed PDL by inlining ggml_cuda_kernel_launch and using perfect forwarding for kernel arguments.
Fixed PDL enablement/disablement based on device-side arch check, requiring a move from macros to inlined functions.
Fixed a performance regression on Ada GPUs by excluding Ada and lower architectures from PDL launches.

Affected Symbols

ggml_cuda_kernel_launch cudaLaunchKernelEx quantize_q8_1 mul_mat_vec_q rms_norm_f32 k_bin_bcast mmvf rope set-rows topk cpy_scalar_contiguous k_get_rows_float flash_attn_combine_results softcap_f32 top-k-moe.cu common.cuh