Change8

b9589

📦 llama-cppView on GitHub →
🐛 3 fixes🔧 2 symbols

Summary

This release focuses on stability fixes for CUDA operations, specifically addressing data races in ssm_scan_f32 by ensuring proper synchronization primitives are used. It also includes updates to various platform binaries and disables several experimental builds.

🐛 Bug Fixes

  • Fixed ssm_scan_f32 data-races in CUDA by adding missing __syncthreads before reusing cub_temp_storage.
  • Added an additional missing __syncthreads call to ensure all threads have read smem before writing to it again in the next loop iteration.
  • Removed unused shared memory (smem) from ssm_scan_f32.

Affected Symbols