b9260

📅 May 21, 2026📦 llama-cppView on GitHub →

✨ 3 features🐛 2 fixes🔧 6 symbols

Summary

This release focuses on refactoring the OpenCL backend initialization, improving GPU identification, and optimizing kernel loading for argsort and flash_attn operations.

Migration Steps

If relying on specific OpenCL initialization behavior, review the refactored initialization logic.
If using OpenCL operations that query max workgroups, ensure argsort kernel is built.

✨ New Features

Refactored OpenCL backend initialization logic.
Improved GPU identification within OpenCL backend.
Cached global memory size in the OpenCL device context.

🐛 Bug Fixes

Ensured argsort kernel is built when querying max workgroups in supports_op.
Implemented lazy loading for flash_attn kernel variants only when needed.

Affected Symbols

opencl backend initialization opencl GPU identification opencl dev_ctx argsort kernel flash_attn kernel supports_op