Change8

b9575

📦 llama-cppView on GitHub →
2 features🐛 2 fixes🔧 4 symbols

Summary

This release introduces the GGML_OP_COL2IM_1D operation to support 1D transposed convolution factorization on the CPU backend, along with necessary RPC protocol version bumps and extensive testing.

Migration Steps

  1. Bump RPC_PROTO_PATCH_VERSION because GGML_OP_COUNT increased from 96 to 97 due to the addition of GGML_OP_COL2IM_1D.

✨ New Features

  • Added GGML_OP_COL2IM_1D operation to ggml for implementing the overlap-add step of 1D transposed convolution.
  • CPU backend now supports GGML_OP_COL2IM_1D for F32, F16, and BF16 with an F32 accumulator.

🐛 Bug Fixes

  • Hardened GGML_OP_COL2IM_1D by adding validation for s0, oc, p0, and input contiguity at graph build time.
  • Improved load balancing for GGML_OP_COL2IM_1D kernel parallelization over the time axis, fixing single-threading when OC=1.

Affected Symbols