b7618

📅 Jan 3, 2026📦 llama-cppView on GitHub →

🐛 2 fixes🔧 4 symbols

Summary

This release focuses on critical bug fixes for the CUDA backend, specifically addressing memory pool allocation logic and integer casting in argsort to prevent computation errors.

Migration Steps

Update llama.cpp to version b7618
If using Windows with CUDA, ensure the corresponding CUDA 12.4 or 13.1 DLLs are updated if necessary

🐛 Bug Fixes

Fixed CUDA pool allocation where object byte size was incorrectly passed instead of object count for fattn-common and dst_tmp_meta
Fixed potential overflow in argsort by explicitly casting integer allocation counts before multiplication

🔧 Affected Symbols

ggml-cudafattn-commondst_tmp_metaargsort