Change8

b9112

📦 llama-cppView on GitHub →
🐛 1 fixes🔧 4 symbols

Summary

This release addresses a critical CUDA kernel launch failure in im2col operations when the output width exceeds 65535, by implementing an outer loop strategy within the kernel. Various pre-compiled binaries for different platforms are also provided.

🐛 Bug Fixes

  • Fixed CUDA im2col (2D and 3D) failing when output width (OW) exceeds 65535 by clamping grid Y dimension and introducing an in-kernel loop with stride MAX_GRIDDIM_Y.

Affected Symbols