b9129
📦 llama-cppView on GitHub →
✨ 2 features🐛 1 fixes🔧 1 symbols
Summary
This release introduces adaptive CPU fallback for ggml-zendnn based on batch size, controllable via an environment variable, and restores previous fallback behavior when the feature is disabled. Numerous pre-compiled binaries are provided.
Migration Steps
- If you wish to disable the new adaptive fallback behavior in ggml-zendnn, set the environment variable GGML_ZENDNN_ADAPTIVE_FALLBACK to 0.
✨ New Features
- ggml-zendnn now features adaptive fallback to the CPU backend for small batch sizes.
- Added runtime environment variable GGML_ZENDNN_ADAPTIVE_FALLBACK to control adaptive fallback (default is enabled).
🐛 Bug Fixes
- Restored original ggml-zendnn fallback logic when adaptive fallback is disabled.