b8603

📅 Mar 31, 2026📦 llama-cppView on GitHub →

🐛 4 fixes🔧 11 symbols

Summary

This release focuses heavily on fixing concurrency issues and graph caching logic within the CANN backend, particularly addressing race conditions during tensor setting and ensuring correct graph reuse based on operation parameters and tensor types.

🐛 Bug Fixes

Fixed multi-thread set_tensor race conditions in the CANN backend by introducing a TensorSetTracker to manage write progress and deferring transformations/uploads for quantized tensors.
Fixed L2_NORM implementation in CANN backend to correctly use the eps parameter by adding a Clamp step.
Fixed ACL graph cache matching in ggml/cann by including GGML_OP_POOL_2D parameters in the comparison.
Fixed ACL graph cache reuse for operations with different tensor types (f16 vs bf16) or differing op_params by adding node_type/src_type fields and ensuring unconditional op_params comparison for relevant ops (SCALE, UNARY, GLU, ROPE, POOL_2D, L2_NORM, NORM_MUL_ADD, RMS_NORM_MUL_ADD, ADD_RMS_NORM).

Affected Symbols

ggml_backend_tensor_set L2_NORM GGML_OP_POOL_2D ggml_graph_node_properties::has_matching_properties()SCALE UNARY GLU ROPE NORM_MUL_ADD RMS_NORM_MUL_ADD ADD_RMS_NORM