b9260
📦 llama-cppView on GitHub →
✨ 3 features🐛 2 fixes🔧 6 symbols
Summary
This release focuses on refactoring the OpenCL backend initialization, improving GPU identification, and optimizing kernel loading for argsort and flash_attn operations.
Migration Steps
- If relying on specific OpenCL initialization behavior, review the refactored initialization logic.
- If using OpenCL operations that query max workgroups, ensure argsort kernel is built.
✨ New Features
- Refactored OpenCL backend initialization logic.
- Improved GPU identification within OpenCL backend.
- Cached global memory size in the OpenCL device context.
🐛 Bug Fixes
- Ensured argsort kernel is built when querying max workgroups in supports_op.
- Implemented lazy loading for flash_attn kernel variants only when needed.