b8184
📦 llama-cppView on GitHub →
✨ 1 features🐛 7 fixes🔧 3 symbols
Summary
This release significantly improves partial offloading performance for Vulkan on AMD hardware by fixing asynchronous transfer mechanisms. Several related bugs in the Vulkan backend logic were also addressed.
✨ New Features
- Improved partial offloading performance on AMD devices using Vulkan.
🐛 Bug Fixes
- Fixed and enabled the cpy_tensor_async function in Vulkan backend.
- Implemented synchronization using timeline semaphore for async transfers on AMD.
- Updated offload_op logic.
- Fixed missing transfer submission.
- Disabled async transfer queue on AMD GCN architecture.
- Reverted change to op batch size.
- Fixed checks related to cpy_tensor_async.