the thing, that you can find great lectures on basically everything right now, is incredible. It amazes me more and more, the feeling that I will never be stuck alone with a problem, because not only has someone done this before, but there was somebody that put a great effort into making a lecture about it. Amazing channel, thank you so much!
That's beautiful to hear! ❤️ Actually, this was also one of my major motivations to start the channel. The internet is an amazing place to have a free and diverse set of educational resources. You find really good ones on more basic topics (like Linear Algebra, 1D Calculus etc.), but it's getting more rare the closer you come to the frontier of research. This is of course reasonable, since the demand is just lower. Yet, I think that it is absolutely crucial to also have high-quality videos that have just been produced for an online audience (not just recorded lectures - those are of course also great). I believe this has many implications, one of it being that knowledge becomes accessible to literally everyone (which has a Internet access and knows English). This is just mind-blowing to me. Personally, I benefitted a lot from these available resources. The channel is my way to give back on the topics I acquired expertise in and that have not yet been covered to the extent necessary. In a nutshell, I am super happy to read your comment, since that was exactly what I was aiming for with the channel. 😊
@@MachineLearningSimulation Your channel is a mine of gold. I'm writing my master thesis and this video was everything I needed, to understand a paper that is the most similar to the method I'm proposing. So to me, being able to watch it is a big deal :) I'll binge-watch all your videos, when I'll have more time. bayesian inference especially is something I always wanted to get into
@@McSwey That's wonderful. I am super happy I could help! :) Good luck with your thesis. And enjoy the videos :). Please always leave feedback and ask question.
You're very welcome :) And great that you mention the amazing book of Bishop. I love it, but at some points I thought that the example-based derivations that I do here in the video, could be a bit more insightful. Thanks for appreciating this :)
Thanks a lot for your feedback :) I love teaching, and it's amazing to hear that the videos are of great value. There is a lot more to come on probabilistic machine learning.
Gerne ;) This video was one of the topics, I struggled with for a long time. I found the derivation in Bishop's book quite tough. So it's amazing to hear that the video is of great help.
I have a question here. Mean Field Approach tells us how to choose the kind of surrogate posterior and eventually we will get the optimal solution for q. However, in the previous lessons, we were assuming that surrogate posterior q is distributed according to a certain distribution and then solving the problem by using maximizing the ELBO. But at that time, we didn't know which type of distribution is the most appropriate one to choose for q. This gave me the illusion that as long as the following assumption is true: surrogate posterior can be divided into smaller independent distribution, the way of maximizing the ELBO to solve the VI problem is meaningless because we can just use Mean Field Approach to solve the VI problem without choosing an arbitrary distribution of q. I am not sure if my understanding is correct.
@@MachineLearningSimulation. Yes. Thank you. I plan to go through all your series. They are appealing. I like the fact that they are self-contained compared to many other sources both books and videos.
while maximizing the functional, why did you say that d [q_0 E_1,2[log p]] / d q_0 = E_1,2 [log p]? [assume that d is a partial derivative here] shouldn't E_1,2[log p] also be dependent on q_0(z_0)? since p is dependent on z_0, z_1, and z_2; hence it cannot be considered a constant w.r.t. q_0(z_0)?
Just one observation: the functional unconstrained optimization solution gives you an unnormalized function, you said it is necessary to normalize it afterwards. So what does guarantee that this density, but normalized, will be the optimal solution?? If you normalize it, it will not be a solution to the functional derivative equation anymore. Also, the Euler-Lagrange equation gives you a critical point, how does one know if it is a local minima/maxima or a saddle point?
Hi, you're welcome. Thanks for the comment ☺️ Can you please give time stamps to the point in the video you are referring to. That helps me recall what I exactly said in the video, thanks.
@@MachineLearningSimulation Hi, first of all incredible Video! You explained this topic so much better then the lecturers at my university! I think the Update a Function until convergence part refers to 25:00. I'd be also interested in knowing what you meant here, is the Mean Field approach an Iterative approach? Also this is probably a stupid question, but at for Example 24:04 you show us the final formula where we take the Expectation over all i != j. I assume we would do this via Integration? But if we are able to integrate over our joint probability p(Z, X=D) with respect to all z_i != j, why can't we just marginalize our joint probability P(Z, X=D) to just get P(X=D) and then use Bayes Rule to directly compute P(Z|X)? I know you probably explained that somewhere in there, but I don't quite get why that isn't an option.
You're very welcome. I haven't read that paper yet, when I first learned about it I struggled with it from Bishop's "pattern recognition and machine learning". I'm glad this different perspective I present here is helpful :)