b9275

📅 May 21, 2026📦 llama-cppView on GitHub →

✨ 2 features🐛 1 fixes🔧 3 symbols

Summary

This release focuses on performance optimizations for Metal, specifically improving the concat kernel with row batching and fixing the set kernel threads. Extensive internal testing refactoring was also performed for CPY shape operations.

Migration Steps

In CPY tests, the parameter 'ne' was renamed to 'ne_src', and a new parameter 'ne_dst' was added (defaults to using src shape).

✨ New Features

Metal: Optimized concat kernel using row batching for small widths (when ne0 < 256) to improve GPU occupancy.
Metal: Optimized set kernel threads.

🐛 Bug Fixes

Fixed dangling reference bug in CPY tests (storing & to temporary std::array).

Affected Symbols

metal:concat_kernel metal:set_kernel test_cpy