b9113
📦 llama-cppView on GitHub →
✨ 1 features🐛 2 fixes🔧 1 symbols
Summary
This release introduces Q4_1 MoE support for OpenCL on Adreno GPUs and includes cleanup of OpenCL code by removing unnecessary asserts and code.
✨ New Features
- Added support for Q4_1 MoE (Mixture of Experts) quantization on OpenCL devices, specifically for Adreno GPUs.
🐛 Bug Fixes
- Fixed the OpenCL supports_op check for Q4_1 MoE to correctly identify supported shapes on Adreno.
- Removed unnecessary asserts and code within the OpenCL implementation.