It would be great if everything could be done in ffmpeg, but I discovered that overlay_cuda doesn't support transparent frames (in fact, I realized there are no frames with alpha channels supported on the GPU to begin with). I want to decode video using nvdec (cuvid) in pyav, copy it to PyTorch for processing, then move it back to pyav for encoding with nvenc.
However, I see no signs that anyone has attempted this, so I'm wondering if it's actually possible. Is there anyone who might be able to help?