Change8

b7609

📦 llama-cppView on GitHub →
🐛 4 fixes🔧 3 symbols

Summary

This release fixes a critical bug in the CUDA backend that caused assertions when copying large tensors. It improves stability for large models by transitioning internal dimensions and byte counts to 64-bit integers.

Migration Steps

  1. Update to version b7609 or higher to resolve CUDA memory copy assertions for large models

🐛 Bug Fixes

  • Fixed an assertion failure in ggml_cuda_cpy when copying large tensors exceeding INT_MAX bytes
  • Updated internal data types to int64_t in ggml-cuda to prevent overflow
  • Added safety asserts for CUDA block numbers
  • Refined conditions for y and z dimensions in CUDA kernels

🔧 Affected Symbols

ggml_cuda_cpyggml_nbytesggml-cuda