There's a timeless beauty to the cadence , passion, and clarity of your lecture series. Glad I could find it again. Different goals now, but same appreciation.
Thanks à lot for your videos. I am à french Student in mathematics and your way of explaining things is so different from here but it all makes more sense. Look forward to the next videos.
great work very good lecture , but i wished there was an order or number of the lecture so i can visit the previous ones if i dont understand an certain concept
The formula I haven see uses the complex conjugate transpose instead of the plain transpose. For real values the two are equivalent but I would make a note of this.
Yeah, me too at first. On the left of the board there's something previously covered about "Quadratic Form Minimization", and that operation parallels differentiation in calculus. Note that to find the x that minimizes rᵀr, you take the derivative WRT x and solve for the x that makes that derivative equal to zero. It looks like he compares the xᵀAx to ax² and illustrates how they become 2Ax and 2ax respectively. So he goes from rᵀr = 2(½xᵀAᵀAx - xᵀAᵀb) + bᵀb to 0=2(AᵀAx - Aᵀb) to AᵀAx = Aᵀb to x = (AᵀA)⁻¹ Aᵀb
@@MichaelStangeland Thanks, this was the missing step in the video. Would you agree to say that we differentiate w.r.t xᵀ? Maybe it truly doesn't matter, one can always rearrange. And we use the fact that AᵀA is symmetric when we work with differentiation this way.
what if ATA is not positive definite (that leads to the rTr does not have minimum value because we can choose x to make rTr as small as possible ) and if ATA is not invertible ? . Please help me .
So ATA is always positive semi-definite . if A is invertibe so that ATA is positive definite and we can find a global minimum of rTr . If A is singular , ATA will not invertible and the equation resulting from derivative ATAx = ATb can have multiple x to satisfy . As a result rTr will have multiple local optima . That 's what my thoughts . Am I correct ? . Thank you very much for quick response .
You are correct. More specifically, it will be one minimum value, but it will attained at a whole subspaces of locations. Sort of like the function f(x,y)=x^2.
I understand your example of f(x,y) = x^2 . The minimum subspaces in this example is a line x = 0(y whatever ) . If we randomly choose location of x and apply gradient descent , do we always certainly get to the minimum subspaces ? Does all optimization problem that can be expressed in quadratic form always have just one minimum value ? (I think yes but I still want a verification ) For other cases that are not a quadratic form , the function will have multiple local optima and the value of function of those local optima are different . If I use gradient descent with randomly choosing the starting point of x , it will lead to different local optima ( in case of luck , we get to the global optima ). So the position of starting x is important . Is there any way that I surely can come to global optima instead of getting stuck of local optima . My idea is choosing multiple starting position x and the get the x that have f(x) minimum . But this way also depends on luck and not solve the situation completely . Can you come up with a good solution for this problem ?
Oh I think I get it now. Since xTATb = is a 1x1 matrix, and since the transpose of a 1x1 must be the same, then you can have the big transpose or ignore it, it doesn't matter in this case!
@@matthewjames7513 AtA is invertible since it is positive definite, therefore, all of its eigenvalues are greater than zero(and real since its symmetric). Note that AtA is symmetric, then consider the magnitude square of Ax, expand it as a dot product and take x to be an eigenvector of A.
Slow until the audio cut out, after which the interesting part takes place. This could have been really good, but the audio problem was not handled well. "What a beautiful equation" for 20 seconds the end immediately after the solution was some frustrating editing. Keep it up, but that's some constructive criticism