thank you very much, you make even the hardest topics understandable and fun to watch! could you delve a little bit deeper the mathematical steps of marginalization and conditional probability that you are talking about between 15:00 and 18:00?
If we try to predict the mean for the unknown points in between the data we have, would the mean always follow a straight line (ex: 0:45 one straight line, 24:05 two lines between 3 data points)?
Definitely not! That’s a great question; I drew the straight lines out of simplicity and if you work out the math, the straight line would imply a mean of 13.75 for x=30 but as we see on the second page we actually got a mean of 13.9 there. The shape of the means curve will likely be nonlinear and will depend on the kernel that you choose.
@@ritvikmath ahh i see. so I can get something like polynomial interpolation of μ'(x) if I pick the right kernel? thinking about it, straight line for the mean makes sense if our known data vector is the only thing that matters, but to get something "curvier" it makes sense that the distribution at one point is affected by the points nearby
Very good explanation, you made it a lot simpler than my teachers ever could. My undergraduate thesis was in gaussian processes, so it was pretty nostalgic seen you dive into this topic. A note I'd like to make is: the choice of the μ prior is very very important depending on the distance and number of data points you have; your model may be very dependent on it. It makes sense to set it at zero to develop the intuition behind it, but as you try to apply it, you see that the model may just tend to zero if your data is too further a part, or if you make a poor choice of L or don't enough data. So, to get the orange dashed line in the video, you'd also need to have a regression on the data to have your μ prior. But the problem is that you are adding more uncertainty to the model, since you are assuming that the mean for you distribution lies in that linear relation. But as you said it, it's great to have a distribution estimate and not only rely on point estimate models; this is a great alternative.
Thank you for the video. it was nicely explained. There are a lot of simplifications. Could you also talk about how best select sigma and l - is it all done empirically? also do you have any example of implementation?
Man your channel would blow up spectacularly if you invested the time into learning how to make really nice visuals.. the whole poorly hand drawn example thing is really 2005 && screams laziness and/or amateur..