Hi, thanks for the video. Just one question, around minute 14:00 you said that the student’s timesteps are between 1and 4, but in the paper the authors state that the final timestep (tau_n) for the student must be 1000 (so equal to the teacher one). So what do you think? The student’s timestep should be something like {1,2,3,1000} or what?
I think they do that so they can use the same scheduler for both models to keep a consistent SNR. Timestep 1000 represents 100% noise which is where you always start from. I'm guessing they use uniform steps after that to get a wide rate of SNR values: {1, 250, 500, 1000}