One point missing (or not stressed out enough) is that the higher p is, the more important is the contribution of large errors (e.g points far from the values to evaluate). On the contrary, the lower p is the higher the contribution of the small errors. So a large p will favour estimations that have small maximal errors whereas small p will favour estimations that stay close to the function overall allowing large spikes in places. This is particularly visible comparing L_1 and L_inf. L_inf will be high for a perfect match except for a single point being far from the function to estimate, when L_0 will be 0. On the opposite, L_inf will be small (=epsilon) for an estimate e(x) = f(x) + epsilon whereas L_1 would be epsilon*(b-a) where [a,b] is the integration domain (so large error).
*Outstanding Video!* Informative with both mathematical and behavioral explanations alongside graphical examples; while being succinct and 'to-the-point' without overdoing nor underdoing the level of detail.
heading towards my first exam block of mechanical engineering and this is really making things clearer. If I wasn't living off a scholarship I would ask if I could donate somewhere, I guess a subscription and likes will have to do for now
p < 1 can still yield interesting results such as when plotting the unit "circle," though it technically doesn't give a "norm." Of course once you've started abandoning axioms like that, there's nothing stopping you for using p < 0 either. (p = 0 actually does give a valid _distance,_ but not a valid _norm,_ and only if 0^0 = 0, in which case it's the Hamming Distance.)
How are you so underrated? Your videos are clear with thorough explanation and nice visual!! Probably the maths on your channel are too niche for the general public haha
I think that matrix norms are a special case of the more general vector norms. In my opinion that inequality would be something that is true for matrices but not a requirement (i.e. something could be a norm without that being true but just so happens that it is true for all the norms we care about :-) ). I of course could be wrong with this though. The reason this inequality is more specific to matrix norms and not normed linear spaces in general is because the ability to multiply two vectors together is not always given for a normed linear space (you can add them together and/or multiply by constants). I hope that makes sense!
Another question. Why are functions included in Lp space? A member of the Lp space is a set of numbers (x,y,z,…) that satisfies some axioms, however functions aren’t a single set of numbers. They are multiple sets of numbers since each point of a function has a set of numbers (x,y,z…) describing it.
Hi! great question! I like to think of the Lp norm for functions as a kind of "limiting case" of vectors tending to infinity. This also relates to your earlier question actually. so imagining you have a set of n data points of a function over an interval eg (1,1), (2,2), (3,3), and we increase the number of points (1,1), (1.1,1.1), (1.2,1.2)... (3,3), not increasing the interval width beyond 1-3 in the x coordinate but the number of points between 1 and 3. Essentially, the limit as we reach an infinite number of data points is what a function is (at least that's how I like to think of it). At which point the sum becomes an integral. well, kind of, it works and that's the most important thing I guess! Now, regarding functions and Lp spaces, I think the crucial thing is the function being bounded in an interval otherwise lots of functions would have a norm of infinity which isn't useful. It might be a useful exercise to consider a continuous function over an interval with a given Lp norm (integral version) and ask if it satisfies the criteria of a norm (the 4 criteria at 0:58). Also to give context, I would say Fourier series are the most widely used application of norms involving functions over a given interval. Theoretically Fourier series finds the trig series (q lets say) which minimises the L2 norm of the function you want to approximate (f lets say) minus q, which is the minimum of sqrt(integral from a to b |f(x)-q(x)|^2 ). Hope this was helpful!
Does anyone ever use L_p norms for anything other than p=1, 2, or infinity? For example, the L_3 norm is well-defined, but does it ever arise naturally?
L_0 norm can be used sometimes in machine learning (in some particular places) to favour models that have many 0 components. It's simply the number of non zero components of your vector.
How do you know what Lp norm you would want to use when comparing 2 functions? Would the desired Lp norm change for functions of 3 variables? How does the L3 norm differ in 3_D vs 2_D. Thank you!
won't the l2 norm be a circle instead of a square at each point. i.e. similar to a making a 3D shape by rotating each point of g(x) about f(x) as center and then taking the volume of that shape. Of course we are then applying a square root after this.
Why is the L2 norm so ubiquitous in applied mathematics, even when there's no direct spacial aspect to the setup, for example in statistics and coding theory? I understand it's nicer to apply than L1 (especially near zero, which is an important point) but is there a good reason other than ease of use?
Essentially, it means that there are no shortcuts; you can't get from one point to another in less distance by deviating to a third point rather than going directly. It's a holdover from defining a metric space.
I think Gilbert Strang discussed this in one of his lectures on matrix methods for data science. He argued that logically this should be the number of none zero components eg || (2,5,0) ||_0 = 2. unfortunately, it fails the triangle inequality so isn't technically a norm! the same is true for any Lp norm where 0
It doesn’t have to be- we are just exploring what shape it makes when |x|=1. This is a diamond under the L1 norm, a circle under the L2 norm, and a square under the L infinity norm. Try showing that if we have |x|=R for some R>0 we get the same shape, just scaled by R.