I seriously hope he teaches a lot more machine learning and those lectures get published here. He is the only teacher I found who actually dives into the math behind machine learning.
He is certainly at a level that makes you understand. Best teacher. No show off , no hand waving. Genuine teacher. Goal is to teach . Thanks you thank you
This professor is amazing. I'm italian so it's more difficult to follow a lesson in english than in my language. Well, it was much much easier to understand PCA here for me than in any other PCA lesson or paper in italian. And not only, but he gave a more rigorous explanation too! Outstanding, really...
I agree. Watch 21:00. The student repating the answer is the eigen value and the eignen vector, and the intructor says Ok this is correct but why!? In most videos on youtube I have seen, people who pretend to be expert do not know (or do not say) the logic behind their claims.
This is what i've been looking for! Every explanation out there just contains more question. "Get the covariance" What for ? "Do decomposition" Why ? "Use eigen vectors" Huh ??. Thank you for explaining every question I have!
Thank you professor. This lecture explains exactly what I was looking for - why principal components are the eigenvectors of the sample covariance matrix.
thank you so much! I tried to see the same topic in other videos and was impossible to understand, this is so clear, ordered and intuitively explain, awesome lecturer!
Wow. This IS what I'm looking for!! Thank you SO much! BTW the explanation for 20:21 is simple if you already have some experience of manipulating with linear algebra. Just decompose the matrix S into EΛ(E^-1) and the sum will turn into sum of ratios of eigenvalues, with the ratios sum up to 1. (assume that the data are already standardized, which is crucial.) Thus you have to put the ratio of corresponding eigenvector to 1 to get the max sum, which is the maximum eigenvalue.
Good lecture. PCA tries to find the direction in the space, namely a vector, that maximises the variance of the projected points or observations on that vector. Once the above method finds the 1st principal component, the second component is the vector orthogonal to the first component.
Dear Prof, at 28:46 you say that tangent of f and tangent of g are parallel to each other - possibly you meant to say that gradient ie normal of f and normal of g are parallel to each other. Anyways it effectively means the same thing. Excellent video!
At time 1:03:26, [U, D, V] = svd (X). Question: shall we do svd(X - E(X)), since X contain pixel values in [0, 255] and the data points X is not centered to E(X)?
Great lecture! Tip to the camera person: there's no need to zoom in on the powerpoint. The slides were perfectly readable even when they were at 50% of the video area but it is much better to see the lecturer and the slide at the same time. Personally, it makes me feel more engaged with the lecture than just seeing a full-screen slide and hearing the lecturer's voice.
Sir I have one question @1:01:50. If I had only one face image with each pixel distribution independent of other but mean corresponds to original face value at that pixel. I think first, second and so on PCs are noise dominant and we are still able to see the face.?
Is it important to show 95% confidence ellipse in PCA? why it is so? If my data is not drawing it then what should i do ? can i used PCA score graph without 95% confidence ellipse?
S is just a notation. S is covariance matrix of original matrix X , u1 is constant(We can say) . In variance constant become square but in the case of vectors form you write u1 u1_transpose. final expresssion is u1 X u1_transpose
1:14 Be careful that this example is not that proper. Note that PCA is basically a system for axis rotation and hence it usually does not have good applications for those data with "donut" (or, swiss roll) structure. A better way is either to use kernel PCA or MVU (maximum variance unfold).
I think this is not about PCA but the fact that distributions in higher dimensions can be projected to lower dimensions such that there is one to one correspondence between higher dimensional and lower dimension counterparts as much as possible.
t=transpose ^2= square This function is quadratic because of u and ut: Quadratic function for one variable has the following form : ax^2 + bx + c Quadratic function for two variables has the following form ax^2 + bxy + cy^2 + dx + ey + g Let's consider an example: 1- Suppose vector u=[x1] [x2] then ut = [x1 x2] matrix S=[1/2 -1] [-1/2 1] 2- ut S gives us the following vector: ut S = [ 1/2*x1-1/2*x2] [ -x1 + x2 ] 3- ut S u gives the following function which will be a scalar if the vector u is known: ut S u = 1/2 * x1^2 + x2^2 -3/2 * x1 * x2 ut S u is quadratic
t= transpose ^2= squared In ordrer to demonstrate that Var(ut X) = ut S u I will use - the könig form of the variance Var(X)= E(X^2) - E^2(X) - and this covariance matrix form COV(X)= E(X Xt) - E(X) [E(X)]t So let's start: We will use the köning form to define the variance: 1- Var(ut X) = E((ut X)^2) - E^2(ut X) * We know that (ut X)^2=(ut X) [(ut X)]t so the first quantity becomes: E((ut X)^2) = E( (ut X) [(ut X)]t ) The second quantity becomes: E^2(ut X)=E(ut X) [E(ut X)]t And we get: 2- Var(ut X)= E( (ut X) [(ut X)]t ) - E(ut X) [E(ut X)]t * We know that [ut]t = u and [(ut X)]t= (Xt u) (notice that the transpose has changed the multiplication order) so the first quantity will change like this: E( (ut X) [(ut X)]t ) = E( (ut X) (Xt u) ) And we get: 3-Var(ut X) = E( (ut X) (Xt u) ) - E(ut X) [E(ut X)]t * We know that Expectancy of a vector(or matrix) filled with scalars gives the same vector(or matrix) and Expectancy of a vector(or matrix) filled with random variables gives Expectancy of that vector(or matrix) In others words: E(u)=u, E(ut)= ut , E(X Xt)=E(X Xt) and E(X)=E(X) So the first quantity becomes E( (ut X Xt u) ) = E(ut) E(X Xt) E(u) = ut E(X Xt) u and the second quantity becomes E(ut X) [E(ut X)]t = E(ut) E(X) [E(ut) E(X)]t = ut E(X) [ut E(X)]t = ut E(X) [E(X)]t u And we get: 4-Var(ut X) = ut E(X Xt) u - ut E(X) [E(X)]t u * let's factorize by ut And we get: 5-Var(ut X) = ut [ E(X Xt) u - E(X) [E(X)]t u ] * let's factorize by u And we get: 6 -Var(ut X) = ut [ E(X Xt) - E(X) [E(X)]t ] u * We know that COV(X)= E(X Xt) - E(X) [E(X)]t And we get: 7-Var(ut X) = ut COV(X) u * Here S = COV(X) And finally, we have: 8-Var(ut X) = ut S u
WITH THE HELP OF GOD WE ADVANCE IN A STRAIGHT LINE THINKING SPEAKING BEHAVIOR ACTIONS LIFE TO THE HIGHEST STATE OF PERFECTION GOODNESS RIGHTEOUSNESS GOD'S HOLINESS EXACTLY AS WRITTEN IN THOSE 10 LAWS