I was wondering if same kind of thing spatial temporal compression can be applied to colorization of B/W or grey scale videos , or is there something like this out there, I mean to preserve color and lighting continuity , FYI NTIRE 2023 Video Colorization Challenge
Yeah that should be possible to do. Good thing about video colorization is that acquiring training data is very straightforward - we just take colored videos and convert them to black B/W, and then train a neural net to reverse it (ie go from BW to color). You probably won’t need diffusion for this because it’s not a generative task, we could just treat it as a sequence to sequence prediction task.
5:24 Shouldn't the "high semantic information" arrow point to the center of the UNet, rather than to the end, where semantic features are once again converted back into detailed info?
Great point! You are right, high semantic information is indeed captured within the bottleneck layers. In the illustration however, my goal was to show how deep layers in convnets capture high semantic information coz their local receptive field expands to capture global features from the input image. The purpose was to show how the skip connections allow combining the low-level highly localized details (at the beginning of the unet) with global level features from the deep layers. They are high semantic too coz they’re derived partly from the bottleneck layers. Note that we can’t directly add the bottleneck features to the initial feature maps with skip connections because of shape mismatch. Instead we are essentially upsampling the bottleneck feature maps to the correct size before adding the initial features back. Hope that made sense.