at 5:58 " ... the variance comes from the network" This is not right. In DDPM, the authors made it constant and then in later studies people started to make those learnable as well.
@@moeinshariatnia59 In that formulation it is. Later in video I mentioned that it’s not necessary and model has to only predict the noise sampled from zero mean unit variance.
@@soroushmehraban I'll definitelly check It out! Do you have by chance any paper suggestions which specifically targets improving over TPT? I'm running out of ideas (and time 😢)
@@soroushmehraban paper - "Test-Time Prompt Tuning for Zero-Shot Generalization in Vision-Language Models" Basically a TTA solution which uses CLIP as backbone
This part of the SwinTransformer paper is the least understood and took a long time, finally I understood it clearly thanks to this lecture. I would really appreciate it if you could find these points of many papers in the future and explain them easily! Super Thanks!