b8586
📦 llama-cppView on GitHub →
🐛 1 fixes🔧 1 symbols
Summary
This release addresses a critical bug in CUDA's CUB argsort implementation when the number of rows is a multiple of the block size. It also provides updated binary releases for numerous platforms.
🐛 Bug Fixes
- Fixed CUB's argsort calculation when nrows % block_size == 0 for CCCL versions less than 3.1, by correcting the calculation of offset_grid from `ceildiv(nrows, block_size)` to `ceildiv(nrows + 1, block_size)` to prevent uninitialized values.