b7609
📦 llama-cppView on GitHub →
🐛 4 fixes🔧 3 symbols
Summary
This release fixes a critical bug in the CUDA backend that caused assertions when copying large tensors. It improves stability for large models by transitioning internal dimensions and byte counts to 64-bit integers.
Migration Steps
- Update to version b7609 or higher to resolve CUDA memory copy assertions for large models
🐛 Bug Fixes
- Fixed an assertion failure in ggml_cuda_cpy when copying large tensors exceeding INT_MAX bytes
- Updated internal data types to int64_t in ggml-cuda to prevent overflow
- Added safety asserts for CUDA block numbers
- Refined conditions for y and z dimensions in CUDA kernels
🔧 Affected Symbols
ggml_cuda_cpyggml_nbytesggml-cuda