b9275
📦 llama-cppView on GitHub →
✨ 2 features🐛 1 fixes🔧 3 symbols
Summary
This release focuses on performance optimizations for Metal, specifically improving the concat kernel with row batching and fixing the set kernel threads. Extensive internal testing refactoring was also performed for CPY shape operations.
Migration Steps
- In CPY tests, the parameter 'ne' was renamed to 'ne_src', and a new parameter 'ne_dst' was added (defaults to using src shape).
✨ New Features
- Metal: Optimized concat kernel using row batching for small widths (when ne0 < 256) to improve GPU occupancy.
- Metal: Optimized set kernel threads.
🐛 Bug Fixes
- Fixed dangling reference bug in CPY tests (storing & to temporary std::array).