b8603
📦 llama-cppView on GitHub →
🐛 4 fixes🔧 11 symbols
Summary
This release focuses heavily on fixing concurrency issues and graph caching logic within the CANN backend, particularly addressing race conditions during tensor setting and ensuring correct graph reuse based on operation parameters and tensor types.
🐛 Bug Fixes
- Fixed multi-thread set_tensor race conditions in the CANN backend by introducing a TensorSetTracker to manage write progress and deferring transformations/uploads for quantized tensors.
- Fixed L2_NORM implementation in CANN backend to correctly use the eps parameter by adding a Clamp step.
- Fixed ACL graph cache matching in ggml/cann by including GGML_OP_POOL_2D parameters in the comparison.
- Fixed ACL graph cache reuse for operations with different tensor types (f16 vs bf16) or differing op_params by adding node_type/src_type fields and ensuring unconditional op_params comparison for relevant ops (SCALE, UNARY, GLU, ROPE, POOL_2D, L2_NORM, NORM_MUL_ADD, RMS_NORM_MUL_ADD, ADD_RMS_NORM).