in some notation please note that the gradient part is also written (x-a)^T times gradient which in this case you took the transpose of the gradient and both are same.
It's derived from: (x(T))*(H)*(-a) + (-a(T))*H*(x) ,where * is multiplication Now these 2 terms are the same as: x(T) is dimension 1 x n, H is n x n, a is n x 1, thus their product is a 1 x 1 matrix (i.e. a scalar) Using the scalar transpose device, we can show that a scalar product is equal to the transpose of itself, i.e. the 2 terms at the top are equal So we can write the first term as (-a(T))*H*(x) and do the addition: (-a(T))*H*(x) + (-a(T))*H*(x) = 2(-a(T))*H*(x)) = -2a(T)Hx
I did not understand where Newton Raphson method come into the picture. We approximated the function using Taylor series and found the minima. Don't we stop after that?
Do you need to calculate the inverse of the hessian? Would the quadratic curve that you'd get from going in the direction of the gradient lead to the minimum of approximating surface?
I think the Step size is missing , usually the gamma in standard notation which is done to ensure that the Wolfe conditions are satisfied at each step ......
I'm having a hard time understanding what c will be in actual applications of this method. Such as using some given f of two variables x1 and x2. Oh well.
whats wrong with you.. what examples do you need? this is a brilliant intuition and explanation of the math.. go and read other optimization books/tutorials and you will realize how easy this video makes it to understand the concepts.. please stop making comments if you cant say something meaningful..