b9575
📦 llama-cppView on GitHub →
✨ 2 features🐛 2 fixes🔧 4 symbols
Summary
This release introduces the GGML_OP_COL2IM_1D operation to support 1D transposed convolution factorization on the CPU backend, along with necessary RPC protocol version bumps and extensive testing.
Migration Steps
- Bump RPC_PROTO_PATCH_VERSION because GGML_OP_COUNT increased from 96 to 97 due to the addition of GGML_OP_COL2IM_1D.
✨ New Features
- Added GGML_OP_COL2IM_1D operation to ggml for implementing the overlap-add step of 1D transposed convolution.
- CPU backend now supports GGML_OP_COL2IM_1D for F32, F16, and BF16 with an F32 accumulator.
🐛 Bug Fixes
- Hardened GGML_OP_COL2IM_1D by adding validation for s0, oc, p0, and input contiguity at graph build time.
- Improved load balancing for GGML_OP_COL2IM_1D kernel parallelization over the time axis, fixing single-threading when OC=1.