No video :(

Factor Analysis and Probabilistic PCA

Mutual Information

Подписаться 71 тыс.

Просмотров 19 тыс.

50% 1

Видео Поделиться Скачать Добавить в

Опубликовано:

21 авг 2024

Ссылка:

Скачать:

Готовим ссылку...

Добавить в:

Мой плейлист

Посмотреть позже

Комментарии : 91

@sasakevin3263 Год назад

The only reason that this guy's video didn't go viral is only 0.01% of the audience are interested in such complex statitics and formulas. But what he made are really awesome!

@Mutual_Information Год назад

That 0.01% are the cool kids - that's who I'm going for!

@ruizhezang2657 Год назад

@@Mutual_Informationawesome job! Best video in this area I’ve ever watched!

@pifibbi 2 года назад

Please don't stop making these!

@MikeOxmol_ 2 года назад

It's criminal that you don't have at least 50k subs. Please don't stop making videos, even though they don't have that many views right now, there are people like me who appreciate the videos very much. Certain topics can seem very daunting when you read about them, especially in such "dense" books as Bishop's PRML or Murphy's PML. However, if I start digging into a topic by watching your video and only then do I read the chapter, the ideas seem to connect more easily and I have to spend less time until it "clicks" if you know what I mean. On another note, if you look for ideas for future vids (which I'm sure you already have plenty), Variational Inference would be a cool topic

@Mutual_Information 2 года назад

Thanks extremely nice of you! Yea this channel is for people like you/me, who want to understand those intense details in those books. I know I would have loved a channel like this if it was around when I was learning. I’m glad it’s doing that job for you. And yes VI is coming! Thanks for your support! And please don’t hesitate to share the channel with people doing the same as you :)

@Nightfold 2 года назад

This sheds some light into what I'm doing with PPCA but still I resent deeply my lack of formation in statistics during my degree.

@divine7470 2 года назад

Thanks for covering this topic. I learned about and how to use FA and PCA in bootcamps but the way you dive into the internals is made so easily digestible.

@alan1507 10 месяцев назад

Thanks for the very clear explanation. I was doing my PhD under Chris Bishop when Bishop and Tipping were developing PPCA - good to get a refresher!

@Mutual_Information 10 месяцев назад

Wow, it's excellent to get your eyes on it - very cool!

@quitscheente95 2 года назад

Damn, I spend so much time going through 5 different books to understand PPCA and here you are, explaining it in an easy, comprehendable, visual manner. Love it. Thank you :)

@Mutual_Information 2 года назад

Awesome - you are the exact type of viewer I'm trying to help

@mCoding 2 года назад

Always love to hear your explanations!

@Mutual_Information 2 года назад

Thanks man - fortunately there’s always a mountain of topics to cover. Plenty to learn/explain :)

@mainakbiswas2584 9 месяцев назад

Had been finding this piece of information for quite a long time. I understood FA by sort of re-discovering it after seeing the sklearn documentation. From that point onward I wanted ro know why it related to pca. This have me the intuition and the resources ro look upto. ❤❤❤

@enx1214 Год назад

True old school best techniques still in use them from 2004. They can save you as can build from nowhere amazing models

@Kopakabana001 2 года назад

Another great video!

@fenncc3854 2 года назад

Great video, really informative, easy to understand, good production quality, and you've also got a great personality for these style of videos.

@Mutual_Information 2 года назад

Thank you! These comments mean a lot. Happy to have you :)

@user-lx7jn9gy6q Год назад

Underrated channel

@Mutual_Information Год назад

You're not going to be believe this.. but I agree

@xy9439 2 года назад

Very interesting video, as always

@AdrianGao 11 месяцев назад

Thanks. This is brilliant.

@wazirkahar1909 2 года назад

Please please please keep doing this :)

@Mutual_Information 2 года назад

No plans on slowing down :)

@jonastjepkema 2 года назад

Amazing! Hope your channel will eexplode soon!

@Mutual_Information 2 года назад

Lol thanks - it honestly doesn’t need to for me to keep going. This is a very enjoyable hobby

@Mutual_Information 2 года назад

But if you want to tell all your friends, I won’t stop you 😉

@juliafila5709 3 месяца назад

Thank you so much for your content!

@Mutual_Information 3 месяца назад

And thank you for watching it :)

@EverlastingsSlave 2 года назад

Man how good are your vidoes, i am amazed at perfection

@Mutual_Information 2 года назад

Thank you very much! Glad you like it

@siddharthbisht1287 2 года назад

For anyone who is wondering about the Parameter formula : 2D + D(D-1)/2 = D + D + D(D-1)/2 D : dimension of the Mean Vector D : diagonal of the Covariance Matrix (Variance of every Random Variable) D(D-1)/2 : Covariance between any two Random Variables di and dj (di and dj

@Mutual_Information 2 года назад

Thanks for this!

@siddharthbisht1287 2 года назад

@@Mutual_Information No worries. Keep up the great work.

@jakubm3036 2 года назад

great video, understandable explanations and cool format!

@Anzar2011 29 дней назад

Awesome work !

@Blahcub 11 месяцев назад

This was a super helpful video thank you so much. I love this material and find it super fun.

@Mutual_Information 11 месяцев назад

Excellent - this one is a doozy so it's nice to hear when it lands

@Blahcub 11 месяцев назад

@@Mutual_Information There's a level of background information that takes a while to process, and even though you say it slow there may be some extra detail warranted. I had to pause a lot and think and rewind to fully grasp the details.

@Mutual_Information 11 месяцев назад

@@Blahcub That's good to know.. you are my ideal viewer :) thank you for your persistence

@Blahcub 11 месяцев назад

@@Mutual_Information In simpler terms, can't we just say PPCA is just PCA, but we model a distribution over the latent space and sample from that distribution?

@Mutual_Information 11 месяцев назад

@@Blahcub In all cases, we are considering a distribution over the latent space. PPCA is distinct in that we assume a constant noise term across dimensions, and that gives it some closed form results.

@user-kn4wt 2 года назад

these videos are awesome!

@MrStphch 2 года назад

Really really nice videos!! Love your way of explaining.

@michaelcatchen84 Год назад

Around 10:35 you skip over the posterior inference of p(z_i | x_i, W, mu, psi) and that it is also a normal distribution because the normal is a conjugate prior for itself. Would love to see this covered in a separate video

@saeidhoseinipour3101 2 года назад

Another nice video. Thanks 🙏 Please cover data science topics such as Clustering and Classification or applications like Textming, Recommender Systems, Image Processing and so on, with statistics perspective and linear algebra perspective.

@Mutual_Information 2 года назад

Thank you! And those are all in the queue :)

@muesk3 2 года назад

Quite funny the difference in treatment of how FA is explained in statistics vs machine learning. :)

@gordongoodwin6279 2 года назад

I was literally thinking this. I honestly thought he was clueless for the first minute then realized it’s just a really interesting and different way to look at factor analysis than what it was originally intended to do (and the way it’s taught in most statistics and psychometrics texts). Great video

@taotaotan5671 Год назад

Hi DJ, awesome contents as always!! I find I can follow your notations much better than textbook notations. At 8:12, I believe the matrix W is shared across all individuals, while z is specific to each sample. It makes intuitive sense to call matrix W common factors, and call z loadings. However, the textbook (Penn State Stat505 12/12.1) seems to call W (in their notation L) factor loadings, while calling z (in their notation f) common factors. I am a little confused and I will appreciate it if you can take a look. Thank you again for the tutorial!

@Mutual_Information Год назад

Hey Taotao! I just checked this against Barber's book. It appears Stat505 is correct - W is called the "factor loading". I actually recall being confused by this too (and why I had to double check just now).. and all I can say is.. yea the naming is confusing. For me, I avoid the naming in general by just thinking of z as latent variable and W as parameters. I agree, this "factor loading" name is shit.

@taotaotan5671 Год назад

@@Mutual_Information Thanks so much DJ! That clarifies.

@matej6418 11 месяцев назад

elite content, imho after the introduction I would love to see the content mainly, dunno if staying on screen makes the delivery better? whats the objective here ?

@Mutual_Information 11 месяцев назад

Appreciate the feedback. It's effectively a cheaper way of keeping the video lively without having to create animations, which take a long time. If I'm not on screen and I leave the level of animation the same, it's a lot of audio over still text, which I've separated heard makes people 'zone out'. This is also an older video. I really don't like how I did things back then. In the future, I'd like to mature into a more dynamic style.

@mrcaljoe1 Год назад

1:18 when you have to use spit instead

@prodbyryshy 5 месяцев назад

amazing video, i feel like i understand each individual step but im sort of missing the big picture

@siddharthbisht1287 2 года назад

I have a couple of questions, 1. What do you mean by "averaging out"? 2. What difference does it make by switching the Covariance matrix from Psi to a Full Covariance Matrix WW* + Psi? Great video though !!

@Mutual_Information 2 года назад

Hey Siddharth, nice to hear from you. For “averaging out”, that was a bit of a hand wave to avoid mentioning the integration p(x) = integral of p(x|z)p(z)dz.. the way I think about that is it’s the distribution over x if you were to rerun the data generation process infinitely and ignore the z’s and ask what distribution over x that would create. For your second question, Psi is a diagonal matrix. So WW* + Psi isn’t diagonal but Psi is.

@siddharthbisht1287 2 года назад

@@Mutual_Information I wanted to understand the difference the change in covariance makes, why are we changing the covariance matrix?

@Mutual_Information 2 года назад

Hmm, if I understand the question, it’s because one way involves a lot few parameters. If you say your covariance matrix is WW* + Psi, then that covariance matrix is determined by D*L + D parameters. If it’s just the regular covariance matrix is a typical multivariate normal, then it’s number of parameters in the covariance is D + D*(D-1)/2.

@siddharthbisht1287 2 года назад

@@Mutual_Information Oh okay. So then L

@njitnom 2 года назад

i love you bro

@Mutual_Information 2 года назад

I love you too dude

@akhileshchander5307 2 года назад

I came to this channel from your comment on another channel, I checked one-two minutes video and found this channel is interesting. My request is please make videos: on these "mathematical notations" you using, because my personal experience, there are many who don't understand this with these symbols, eg: what is the meaning of {x i T} N i =1? Thank

@Mutual_Information 2 года назад

Hey Akhilesh, I see what you're saying. I'm thinking of creating a notation guide - something like a 1-pager linked to in the description which would go over the exact notation. To answer your question {x_i^T: i = 1, ... N} just refers to the set of row vectors (x_i is assumed to be a column vector, so x_i^T is a row vector). The {...} is just set notation. It's just a way of saying.. this is the set of N row vectors.. and we'll indicate each one using the index i. So x_3^T is the third of N row vectors.

@timseguine2 Год назад

One question that came to mind, if you are trying to do factor analysis using an iterative method, are the PPCA ML estimates a good initial value?

@Mutual_Information Год назад

Possibly.. but if you're going to accept the costs of computing those initial estimates, you might as well just do the FA routine? I don't think it would be worth it

@horacemanfred354 2 года назад

Great video. Could you cover the use of Energy Functions in ML?

@Mutual_Information 2 года назад

Maybe one day, but no concrete plans. Fortunately, there’s an excellent RU-vidr who covers energy models really well : ru-vid.com/video/%D0%B2%D0%B8%D0%B4%D0%B5%D0%BE-y6WNrHskm2E.html these would probably be a primary source if I were to cover the topic

@horacemanfred354 2 года назад

@@Mutual_Information Thanks. So funny, in the video the Alfredo Canziani says it has taken him years to understand energy functions. It appears it is about the manifold of the cost function and I understand it better now.

@tommclean9208 2 года назад

is there any code that supplements your videos? i always find i learn easier by looking at an playing around with code :)

@Mutual_Information 2 года назад

Not in this one unfortunately. For this case, I'd check out the use case for FA from sklearn : scikit-learn.org/stable/modules/decomposition.html#fa If you look one level deep into their .fit() method, you'll see the SVD algorithm I reference in the vid. I have plans for more code examples in future vids

@taotaotan5671 2 года назад

Does restricted maximum likelihood, a technique that is often used in mix effect model, may also apply in factor analysis?

@Mutual_Information 2 года назад

I don't know much about restricted max likelihood, but from what I've (just) read, it appears flexible enough to accommodate the FA assumption. Anytime you're estimating variance/covariance, you could use a low-rank approximation.

@Blahcub 11 месяцев назад

Isn't it a problem that in factor analysis, PCA and any dimensionality reduction done here is that it assumes a linear relationship?

@Mutual_Information 11 месяцев назад

Yea, definitely. The nonlinear version of this rely on the manifold hypothesis, which is akin to saying the assumptions of FA hold, but only *locally* and after nonlinear transformations.. and that essentially changes everything. None of the analytic results you see here hold and we have to resort to other things, like autoencoders.

@DylanD-v9g Год назад

Is the log likelihood a concave function of \psi and w

@Mutual_Information Год назад

If you fix w, then the function is a concave func of psi.. and if you fix psi.. yes I bet it's also a concave function (because it's like doing linear regression). I'm fairly sure of this but not 100%.

@DylanD-v9g Год назад

@@Mutual_Information Ok thanks, I was asking because I wanted to know whether the EM algorithm is guaranteed to converge to the MLE of the FM. As the Em algorithm is guaranteed to increase the log likelihood at each step, I would assume that if it is concave then we should converge to the MLE. But from reading around, it seems that getting the MLE for the FM using EM is not guaranteed. Btw your videas are great!

@DylanD-v9g Год назад

I guess an important point is that if a function is concave in each of its variables keeping the rest fixed, the function is not guaranteed to be concave. So using what you said, we dont know if the log likelihood is a concave function of \psi and w.

@InquilineKea 2 года назад

is this like fourier decomposition?

@Mutual_Information 2 года назад

Eh, I'd say not especially. It's only similar insofar as things are decomposed as the sum of scaled vectors/functions. Fourier series is specific to the complex plane, sin/cos. I don't see much of that showing up. Maybe there is a connection since the normal distribution concerns circles.. but I don't see it.

@abdjahdoiahdoai 2 года назад

i think you are thinking fourier decomposition is by fourier basis, in that sense. Maybe a SVD is what you are looking for

@gwho Год назад

When discussing personality theory, big 5 (aka OCEAN) is superior to MBTI (meyers briggs type indicator) because big 5 uses factor analysis, whereas MBTI presupposes its 4 dimensions. Then when comparing MBTI to astrology, just laugh astrology out of the room

@EverlastingsSlave 2 года назад

you are doing so good work therefore i invite you to read Quran so that you are saved in afterlife stay blessed