Change8

b9129

📦 llama-cppView on GitHub →
2 features🐛 1 fixes🔧 1 symbols

Summary

This release introduces adaptive CPU fallback for ggml-zendnn based on batch size, controllable via an environment variable, and restores previous fallback behavior when the feature is disabled. Numerous pre-compiled binaries are provided.

Migration Steps

  1. If you wish to disable the new adaptive fallback behavior in ggml-zendnn, set the environment variable GGML_ZENDNN_ADAPTIVE_FALLBACK to 0.

✨ New Features

  • ggml-zendnn now features adaptive fallback to the CPU backend for small batch sizes.
  • Added runtime environment variable GGML_ZENDNN_ADAPTIVE_FALLBACK to control adaptive fallback (default is enabled).

🐛 Bug Fixes

  • Restored original ggml-zendnn fallback logic when adaptive fallback is disabled.

Affected Symbols