Change8

b8776

📦 llama-cppView on GitHub →
1 features🐛 1 fixes🔧 3 symbols

Summary

This release limits DeviceSegmentedSort to immediate mode due to CUDA graph capture limitations, ensuring stability when using CUDA graphs, and includes performance comparisons between the two sorting methods.

✨ New Features

  • Added test case to enforce dispatch to DeviceSegmentedRadixSort when running in CUDA graph mode.

🐛 Bug Fixes

  • Limited DeviceSegmentedSort to immediate mode because it is not capturable in a CUDA graph, falling back to the slower DeviceSegmentedRadixSort in graph mode.

Affected Symbols