b8184

📅 Mar 1, 2026📦 llama-cppView on GitHub →

✨ 1 features🐛 7 fixes🔧 3 symbols

Summary

This release significantly improves partial offloading performance for Vulkan on AMD hardware by fixing asynchronous transfer mechanisms. Several related bugs in the Vulkan backend logic were also addressed.

✨ New Features

Improved partial offloading performance on AMD devices using Vulkan.

🐛 Bug Fixes

Fixed and enabled the cpy_tensor_async function in Vulkan backend.
Implemented synchronization using timeline semaphore for async transfers on AMD.
Updated offload_op logic.
Fixed missing transfer submission.
Disabled async transfer queue on AMD GCN architecture.
Reverted change to op batch size.
Fixed checks related to cpy_tensor_async.

Affected Symbols

cpy_tensor_async transfer_queue offload_op