b8532
📦 llama-cppView on GitHub →
✨ 2 features🔧 7 symbols
Summary
This release introduces F32 kernel type support for `CONV_TRANSPOSE_2D` on both CUDA and CPU. It also includes substantial refactoring of the 2D transpose implementation for better parameter handling and flexibility across data types.
Migration Steps
- Review code related to `conv2d_transpose_params` as it was removed and replaced by direct kernel launch dispatching in the CUDA implementation.
- If relying on specific kernel type instantiation for 2D transpose tests, note that test structures were updated to reorder constructor arguments and iterate over kernel types dynamically.
✨ New Features
- Added support for F32 kernel type for `CONV_TRANSPOSE_2D` on CUDA and CPU.
- Enhanced CPU conv2d transpose implementation by introducing a templated kernel type for improved flexibility with F16 and F32 data types.