Change8

b8184

📦 llama-cppView on GitHub →
1 features🐛 7 fixes🔧 3 symbols

Summary

This release significantly improves partial offloading performance for Vulkan on AMD hardware by fixing asynchronous transfer mechanisms. Several related bugs in the Vulkan backend logic were also addressed.

✨ New Features

  • Improved partial offloading performance on AMD devices using Vulkan.

🐛 Bug Fixes

  • Fixed and enabled the cpy_tensor_async function in Vulkan backend.
  • Implemented synchronization using timeline semaphore for async transfers on AMD.
  • Updated offload_op logic.
  • Fixed missing transfer submission.
  • Disabled async transfer queue on AMD GCN architecture.
  • Reverted change to op batch size.
  • Fixed checks related to cpy_tensor_async.

Affected Symbols