Change8

b9113

📦 llama-cppView on GitHub →
1 features🐛 2 fixes🔧 1 symbols

Summary

This release introduces Q4_1 MoE support for OpenCL on Adreno GPUs and includes cleanup of OpenCL code by removing unnecessary asserts and code.

✨ New Features

  • Added support for Q4_1 MoE (Mixture of Experts) quantization on OpenCL devices, specifically for Adreno GPUs.

🐛 Bug Fixes

  • Fixed the OpenCL supports_op check for Q4_1 MoE to correctly identify supported shapes on Adreno.
  • Removed unnecessary asserts and code within the OpenCL implementation.

Affected Symbols