Change8

b9260

📦 llama-cppView on GitHub →
3 features🐛 2 fixes🔧 6 symbols

Summary

This release focuses on refactoring the OpenCL backend initialization, improving GPU identification, and optimizing kernel loading for argsort and flash_attn operations.

Migration Steps

  1. If relying on specific OpenCL initialization behavior, review the refactored initialization logic.
  2. If using OpenCL operations that query max workgroups, ensure argsort kernel is built.

✨ New Features

  • Refactored OpenCL backend initialization logic.
  • Improved GPU identification within OpenCL backend.
  • Cached global memory size in the OpenCL device context.

🐛 Bug Fixes

  • Ensured argsort kernel is built when querying max workgroups in supports_op.
  • Implemented lazy loading for flash_attn kernel variants only when needed.

Affected Symbols