Is the `Tensor` type suited to implement an `im2col` operation? I tried and only succeeded with nested loops—which of course is bad for CUDA. In the end, I want to arrive at an efficient convolution. Would that be possible with the current API surface?
Is the
Tensortype suited to implement anim2coloperation? I tried and only succeeded with nested loops—which of course is bad for CUDA.In the end, I want to arrive at an efficient convolution. Would that be possible with the current API surface?