b9589
📦 llama-cppView on GitHub →
🐛 3 fixes🔧 2 symbols
Summary
This release focuses on stability fixes for CUDA operations, specifically addressing data races in ssm_scan_f32 by ensuring proper synchronization primitives are used. It also includes updates to various platform binaries and disables several experimental builds.
🐛 Bug Fixes
- Fixed ssm_scan_f32 data-races in CUDA by adding missing __syncthreads before reusing cub_temp_storage.
- Added an additional missing __syncthreads call to ensure all threads have read smem before writing to it again in the next loop iteration.
- Removed unused shared memory (smem) from ssm_scan_f32.