Change8

b7630

📦 llama-cppView on GitHub →
3 features🔧 2 symbols

Summary

This release introduces performance improvements for the CANN backend by adding operator fusion support for ADD + RMS_NORM operations. Various pre-built binaries for different operating systems and hardware configurations are also provided.

Migration Steps

  1. To enable the new ADD + RMS_NORM fusion in the CANN backend, set the GGML_CANN_OPERATOR_FUSION environment variable to true.

✨ New Features

  • Added operator fusion support for ADD + RMS_NORM operations in the CANN backend to reduce memory access overhead.
  • Implemented ggml_cann_op_add_rms_norm_fused() using ACLNN AddRmsNorm.
  • Added ggml_cann_can_fuse() to check fusion eligibility.

🔧 Affected Symbols

ggml_cann_op_add_rms_norm_fusedggml_cann_can_fuse