Тёмный

Fitting a Line using Least Squares  

Virtually Passed
Подписаться 35 тыс.
Просмотров 62 тыс.
50% 1

Опубликовано:

 

7 сен 2024

Поделиться:

Ссылка:

Скачать:

Готовим ссылку...

Добавить в:

Мой плейлист
Посмотреть позже
Комментарии : 208   
@virtually_passed
@virtually_passed 2 года назад
Typo at 11:39 it should be || [A]X - b ||^2 Don't worry, this doesn't affect anything in the video :)
@klafbang
@klafbang 2 года назад
Nice video, but it's a shame you don't give more intuition for the choice of the least squares vs. the other distance measures and the fact that this is just a projection onto a linear function space - that realisation is what really made linear regression click for me and made it possible to trivially generalise it to other functions.
@virtually_passed
@virtually_passed 2 года назад
Glad you liked the video and thanks for the feedback!
@red_rassmueller1716
@red_rassmueller1716 2 года назад
Still, the beginning was just introduction. He doesn't have to pick it up again if he wants to talk about the square method.
@merseyless
@merseyless Год назад
​@@red_rassmueller1716Then why mention them in the first place? You can't blame us for being curious about an unresolved comparison.
@asthmen
@asthmen Год назад
I agree with this comment - I've always wondered why we don't ever use the other two measures, and this would have been a good opportunity to naswer the question. Could you maybe point to any other resources that do?
@navegaming8198
@navegaming8198 3 месяца назад
Not using sum notation on the first proof is making it so much easier for me to understand. Brilliant!
@miguelcerna7406
@miguelcerna7406 2 года назад
Excellent video. Love the proof behind the parabola and the global min that the squared residuals must eventually attain. Bravo sir.
@virtually_passed
@virtually_passed 2 года назад
Thanks!
@leif1075
@leif1075 Год назад
@@virtually_passed and why not just set a to zero allthe time? sint that easier--otherwise I don't see how to tell if your line starts close to origin or not
@johnchessant3012
@johnchessant3012 2 года назад
Great video. Here's a cool fact: The first row of the matrix equation at 14:27 says that the sum of the residuals must be zero, which (after a bit of algebra) proves that the least-squares line must map the average of x to the average of y.
@virtually_passed
@virtually_passed 2 года назад
Very cool fact! Thanks for sharing! I'd never heard of this before so I decided to prove it for myself: r1 + r2 + ... + rn = 0 (a+bx1-y1) + (a+bx2-y2) + ... + (a+bxn-yn) = 0 n*a + b(x1+x2+...+xn) - (y1+y2+...+yn) = 0 divide both sides by 'n' a + b*x_avg - y_avg = 0 y_avg = a + b*x_avg Therefore the point P = (x_avg, y_avg) will lie on the line y = a+bx. Very neat!
@shophaune2298
@shophaune2298 2 года назад
@@virtually_passed ...at which point we could consider that point P to be a 'new' origin, and use coordinates relative to it to find the best fit of the data passing through that point - the simpler 1-dimensional case explored earlier in the video.
@morgan0
@morgan0 2 года назад
yeah partway thru the video i stopped to remake it in desmos to see if the horizontal component could be used in some way because i was curious (tho i didn’t get anywhere with it), and i first offset them all by the x and y averages and did the 1d case
@VitinhoFC
@VitinhoFC 2 года назад
This is neat indeed!
@mymo_in_Bb
@mymo_in_Bb Год назад
The method of least squares was (along with everything else going on at the time) the point when i stopped understanding my linear algebra course at uni. And now i understand it. Thanks a lot
@virtually_passed
@virtually_passed Год назад
Glad it helped!
@koendos3
@koendos3 2 года назад
Wow, ive been binge-watching the SoMe2 video's. And ive been impressed with everyone's effort. Especially this video is so sick!
@virtually_passed
@virtually_passed 2 года назад
Thanks so much :)
@itamar.j.rachailovich
@itamar.j.rachailovich 2 года назад
I watched it few days after you uploaded it, but I was in bed almost sleeping. Today I watched it again. It's amazing, and you are excellent teacher!!! Keep going!!!
@virtually_passed
@virtually_passed Год назад
Thanks!
@JKTCGMV13
@JKTCGMV13 9 месяцев назад
Within seconds of the video playing I immediately got an intuitive explanation of the least squares method better than I've ever had
@eriktempelman2097
@eriktempelman2097 2 года назад
Absolutely wonderful ! ! ! Combines linear algebra with calculus. This video is a GREAT "commercial" for both topics.
@virtually_passed
@virtually_passed 2 года назад
Thanks :)
@eriktempelman2097
@eriktempelman2097 2 года назад
You're welcome! Really, after all those years (I'm from 1969) this is the first time I see how both can go hand in hand.
@giordano7703
@giordano7703 2 года назад
Very simple, yet effective, explanation; I come out of this video happy knowing I learned something new which I would have never tackled by myself. Great work!
@virtually_passed
@virtually_passed 2 года назад
Thanks for the kind words!
@grinreaperoftrolls7528
@grinreaperoftrolls7528 2 года назад
Hold up, I’ve DONE THIS before. But this is a much better explanation. Thank you.
@fire17102
@fire17102 2 года назад
Holy $#17 this is like a dream come true, I can't believe you made this interactive! I literally just commented I want interactiveness built into #some2 videos! I haven't even gotten to the video yet... Mad respect guys you're awesome
@virtually_passed
@virtually_passed 2 года назад
Thanks for the kind words! We intended to make it more interactive, but we ran out of time. Originally we wanted it to be a "choose your own adventure" thing where you could choose the type of proof, choose whether you wanted to see a proof for 1 unknown (easy version) or 2 unknowns (harder version). Interactivity is still a dream of mine :)
@matveyshishov
@matveyshishov 2 года назад
Thanks for the visuals! When I was learning OLS, I remember that my primary questions were a) why is the sum a good choice, what other options are there? and b) why squares and not absolute values? I see that you just jump over these two questions, but from my experience, for somebody, who is trying to understand the method (as opposed to memorizing it) these are the central questions, which unlock the understanding. So you may want to add some exposition on that in the future, I'm sure many students will appreciate.
@virtually_passed
@virtually_passed 2 года назад
Hi thanks for your kind words and feeback! I'm actually in the process of making more videos now so this is really good advice :) thanks! As a short answer to your question: 1) One of the massive advantages of Ordinary Least Squares (OLS) is that it guarantees convexity (ie, the parabola has only one global optimum). Convexity is a big deal in the field of optimizations. Some other fitting methods don't have this feature meaning that it's possible to get stuck in local optimums, which means that you won't get the best fit. 2) This is super computationally fast to compute. There are downsides to this method though which I haven't talked about. One is that it's highly sensitive to large outliers (since it squares the error). But this is partially resolved by adding a regularization term (basically adding a 1-norm and a 2-norm together in the objective). I'll elaborate more in a future video :)
@matveyshishov
@matveyshishov 2 года назад
@@virtually_passed Thank you very much!
@virtually_passed
@virtually_passed 2 года назад
@@matveyshishov you're welcome!
@andrewzhang5345
@andrewzhang5345 2 года назад
@@virtually_passed Regarding computing, it's a bit misleading to claim you don't need iterations to find your parameters. Given a small dataset, you can fit the most complex model with the slowest optimization method quickly. Indeed, for least squares, solving the normal equation is trivial when the data set is small, but difficult with a larger dataset, and one resorts to iterative methods to solve least squares.
@virtually_passed
@virtually_passed 2 года назад
@@andrewzhang5345 I agree, thanks for the comment. I've edited my response.
@BharmaArc
@BharmaArc 2 года назад
Great video as always! Great visuals that really give insight to the problem, I also appreciate how you color code things and show every step of the computation. A tiny correction: at 11:40 it should be norm *squared*.
@virtually_passed
@virtually_passed 2 года назад
Thanks for the comment! I really appreciate the kind words. You're absolutely right! At 11:40 it should be error = ||AX-b||^2 Thanks for pointing that out :) fortunately it doesn't affect the rest of the video though :)
@leif1075
@leif1075 Год назад
@@virtually_passed wait just because the equation at 7:50 has a bunch of squared terms does not tell you it's a parabola though so why did you say that??
@leif1075
@leif1075 Год назад
@@virtually_passed oh and also if the linear b terms xy if some of those are negative,then even if this mightbe a parabola it might not always point up--since the negative linear b terms might be greather than the positive b squared terms..see what I mean??
@leif1075
@leif1075 Год назад
@@virtually_passed Hope you can respond when you can. Thanks very much.
@virtually_passed
@virtually_passed Год назад
​@@leif1075 Sorry for the late reply! Notice that the error has the form of a parabola: e = k1*b^2 -2*k2 * b + k3 Where the constants k1, k2, and k3 are given by: k1 = x1^2 + x2^2 + ... k2 = x1y1 + x2y2 + ... k3 = y1^2 + y2^2 + ... Also note that k1 is always >= 0 because any real number squared is positive. It honestly doesn't matter what the values of k2 and k3 are, since the convexity of a parabola is always determined by the coefficient of the squared term. I've created a desmos link for you here to see for yourself why this is true: www.desmos.com/calculator/waagmohtua
@kayleighlehrman9566
@kayleighlehrman9566 2 года назад
I can't remember who originally said it but one of my favourite quotes about proofs is that "you shouldn't set out to prove something that isn't already almost intuitive."
@tex1297
@tex1297 2 года назад
I wish we had all this materials back in scool 30 years ago... Nice work
@virtually_passed
@virtually_passed 2 года назад
Thanks!
@robertkelly649
@robertkelly649 Год назад
This an absolutely beautiful explanation of least squares and where it came from. The visual and conceptual combined was really wonderful. Wish I had this in college. It would have spared me a lot of pain. 😄
@virtually_passed
@virtually_passed Год назад
Glad you enjoyed it!
@allenadriansolis8032
@allenadriansolis8032 2 года назад
Great explanation and visualization. Well done.
@virtually_passed
@virtually_passed 2 года назад
Thanks!
@JohnBoen
@JohnBoen 2 года назад
Least squares... I had always thought of it as a square root sort of thing. I do statistics. I write queries and do data analysis as a job - for 25 years. I have a bit of a clue... But the the changing sizes of the squares as the line moved made me go "Ohhhhhhhhhhhh!". It just suddenly became intuitive. Great way to explain it- you are getting a comment 30 seconds in. Nice work :)
@virtually_passed
@virtually_passed 2 года назад
Glad you liked it :)
@johnathanmonsen6567
@johnathanmonsen6567 2 года назад
I understood JUST enough linear algebra to understand how clever that is. I started to phase out on the multivariate (that's where I started flagging in college), but dang that was a really cool reveal that the 'Jacobian' was just A transverse.
@virtually_passed
@virtually_passed 2 года назад
Glad you managed to follow it! Linear algebra is very powerful!
@TroyaE117
@TroyaE117 Год назад
Good video! Never had to use multi variable approach but now I know.
@web2wl00p
@web2wl00p 2 года назад
very very nice! I have been teaching LSQ optimization to undergrads for years, now I will just point them to your video 🙂best of luck for #SoME2
@virtually_passed
@virtually_passed 2 года назад
Thanks for the kind words!
@TheDGomezzi
@TheDGomezzi 2 года назад
Some recreational mathematics is learning cool stuff you didn’t already know, and some recreational mathematics is re-learning stuff you knew but with a better feel and intuition behind it. I think a lot of people overlook that second one, and this video shows how it can be really cool! Would love to see a video where you go over the three methods you suggested and their pros and cons, that would be super cool.
@virtually_passed
@virtually_passed 2 года назад
Hey thanks for the comment and kind words. A lot of people have requested a summary video like that :) it's on the list :)
@MannISNOR
@MannISNOR 2 года назад
Great job - This is absolutely fantastic! You are doing us all a favor.
@gustavom8726
@gustavom8726 Год назад
THis is awsome!! It represents perfectly the SoM2 spirit, but with a very original way to explain and present.Thank you so much
@virtually_passed
@virtually_passed Год назад
Thanks!
2 года назад
Broh... I love you. This was beautiful!!! So helpfull to understand Vandermonde's matrix...
@virtually_passed
@virtually_passed 2 года назад
Thanks!
@yusufkor5900
@yusufkor5900 2 года назад
Whoa! I'm illuminated! Thanks.
@virtually_passed
@virtually_passed 2 года назад
You're welcome :)
@arddenouter4553
@arddenouter4553 Год назад
Maybe mentioned already, but I think that what you demonstrate is the Reduced-Major-Axis-method, where the error can be in two variables. The least-squares-method assumes an input parameter without error (say the x axis) and an output parameter with error (say the y axis). The least-squares-method reduces (in case of in error in y) the vertical distance between the line and the actual points. At least that is how I have understood it while using it some time ago.
@ShankarSivarajan
@ShankarSivarajan 2 года назад
3:01 It _does_ seem subjective when you put it like that. Which is why it's important to point out that the Least Squares method is equivalent to Maximum Likelihood Estimation for normal data, which makes it objectively superior.
@HesderOleh
@HesderOleh 2 года назад
I don't think it is objective to assume that the MLE is the best estimator. There are plenty of circumstances where you actually want something else.
@nehachopra2954
@nehachopra2954 2 года назад
Thank you so much for making this topic so so so interesting Hope to see much more
@virtually_passed
@virtually_passed 2 года назад
Hey thanks for the kind words! I've made another video on least squares here and I intended to make a few more: ru-vid.com/video/%D0%B2%D0%B8%D0%B4%D0%B5%D0%BE-wJ35udCsHVc.html
@pedrodaccache4026
@pedrodaccache4026 2 года назад
wow, can't understand how this channel only has 13k subs. awesome video!
@virtually_passed
@virtually_passed 2 года назад
Thanks!
@aamer5091
@aamer5091 Год назад
Words don't appropriately express gratitude, but thanks.
@agspda
@agspda 2 года назад
Love your content, luckily I just got recommended this video, I got lost a bit at the end with the multivariable calculus but I understood the reasoning and that is a lot, thanks!!
@virtually_passed
@virtually_passed 2 года назад
Thanks for the comment. As long as you get the big picture, that's what matters most. The rest are all details :)
@mackansven3656
@mackansven3656 6 месяцев назад
This was great, all of it, amazing job.
@virtually_passed
@virtually_passed 6 месяцев назад
Thanks 😊
@MaxPicAxe
@MaxPicAxe 2 года назад
Wow that was such a well-made video
@virtually_passed
@virtually_passed 2 года назад
Thanks!
@Bunnokazooie
@Bunnokazooie Год назад
Awesome visualization
@virtually_passed
@virtually_passed Год назад
Thanks!
@hpp6116
@hpp6116 2 года назад
Fantastic presentation!
@virtually_passed
@virtually_passed 2 года назад
Thanks :)
@AlexeyMatushevsky
@AlexeyMatushevsky 2 года назад
Great video, thank you so much for your explanation!
@virtually_passed
@virtually_passed 2 года назад
Really glad you liked it :)
@HesderOleh
@HesderOleh 2 года назад
Nice video. It took far too long for me to understand this, because I didn't have the words to articulate my question of why squares instead of L1 metric throughout highschool and then uni or I would be brushed off with a silly answer like that it is just the best way. A similar question that I had unanswered for a long time is why e is the number that is raised to the i*theta for polar coordinates in the complex plane, and it often was dismissed with the fact that sin and cos were connected to e, but not why or how. When I did have a good professor who explained it well I was so happy. I wondered if there were ever times that we would want to use higher norms or Lp-spaces, because some of those are easily solved as well, but they told me that it would give undue weight to outliers. I was satisfied with that answer at the time, but now I wonder if there are any applications where you do want the focus to be on outliers where those datapoints are actually an important part of telling the story of what the data means.
@nadavperry2267
@nadavperry2267 Год назад
this is really awesome! allthough as a math major I would've liked to see an expansion of the formula for n dimensions (I would assume it uses r_i^n and the jacobin and shouldn't be very hard to generalize although I may be wrong)
@braineaterzombie3981
@braineaterzombie3981 Год назад
Excellent video. Make more video on statistics.
@TheGoldenFluzzleBuff
@TheGoldenFluzzleBuff 2 года назад
Wow. You more or less just summarized concisely what I spent weeks learning in 4000 level econometrics courses. Could you do one for multivariable (multidimensional) values?
@virtually_passed
@virtually_passed 2 года назад
Thanks for the comment. What do you mean by multidimensional values? Do you mean to teach multivariable calculus? Or teach LS with multiple unknowns? :)
@EuphoricPentagram
@EuphoricPentagram 2 года назад
Im loving this I was never really good at math in school (only making it to algebra 1/2 and geometry) and I'm already half way through (wanted to pause it so I don't miss any) and it's amazing I've been able to understand everything very well (some time programming probably helped) but you have made it so accessible and I love how you take a moment to pause and explain what the key points are (like that there's one global minimum) and that we should notice them to remember for later it's very helpful in keeping track of everything If Im ever teaching something I'm definitely stealing that idea 10/10 will Like and Subscribe Edit: just finished it with the matrices and it was still very understandable (even if I don't fully understand it) I was able to grasp enough to see and understand the power of this And when you coded it it also helped a lot cus it brought it to a language I knew instead of one I'm still learning Still 10/10 would recommend
@virtually_passed
@virtually_passed 2 года назад
Hey thanks so much for the kind words. I spent a lot of effort trying to make the video as accessible as possible so I'm glad it worked for you!
@movax20h
@movax20h 2 года назад
Minimizing of perpendicular distance (squared) is also sometimes used, especially when having uncertaininty in both x and y. It is however way more complex computionally. Most packages for fitting do not support it, but it is possible, and used this in the past (including estimating error of the parameter estimation).
@virtually_passed
@virtually_passed 2 года назад
Agreed :)
@nivcohen5371
@nivcohen5371 2 года назад
Amazing video! Very enlightening
@virtually_passed
@virtually_passed 2 года назад
glad you liked it!
@vijay1968jadhav
@vijay1968jadhav 2 года назад
Wonderful video . Need more videos sir
@virtually_passed
@virtually_passed 2 года назад
Thanks :)
@algorithminc.8850
@algorithminc.8850 2 года назад
Nice useful channel. Great stuff ... thanks. Cheers.
@virtually_passed
@virtually_passed 2 года назад
Thanks!
@jakobr_
@jakobr_ 2 года назад
Wow, it’s surprising how compact the expression ended up being! Very nice video. I wonder about one of the other approaches you showed at the beginning, namely the “minimize perpendicular distance” method. That one appeals to me because it doesn’t seem to care about the rotation of our coordinate axes. If we were to turn that into a sort of “least circles” fit, would the resulting expression be anywhere near as neat or useful?
@virtually_passed
@virtually_passed 2 года назад
Hey, thanks for your comment. The method you're referring to is formally called Orthogonal Distance Regression. If you want all the details I'd recommend reading the book Numerical Optimization by Stephen Wright. In short, this method is superior in many ways but is generally more computationally expensive because the "Jacobian" matrix shown at 14:45 is no longer a constant in the general case, and so the minimization requires iterations. Hope that makes sense :)
@jakobr_
@jakobr_ 2 года назад
@@virtually_passed Thanks for the detailed answer! It made a lot of sense
@MurshidIslam
@MurshidIslam 2 года назад
Excellent video. Can you do another video explaining the pros and cons of the other methods (i.e., the vertical distance and the perpendicular distance methods) compared to the least squares method?
@virtually_passed
@virtually_passed 2 года назад
Hi thanks for the comment. Quite a few others have requested a video like that. It's on the list :)
@gregorygargioni
@gregorygargioni 2 года назад
The sad part of this amazing applied math video is that it ends!!!
@virtually_passed
@virtually_passed 2 года назад
Thanks! I have a follow-up proof video about least squares if you're interested ☺️
@pierrebegin9253
@pierrebegin9253 2 года назад
Least square fit is highly sensitive to outlayer points in the fit therefore the fit is distorted by bad points in the fit. A more robust estimation can be obtained by minimizing the mediane instead of the squared error which is biased. Try it !
@virtually_passed
@virtually_passed 2 года назад
Yes! Which is why sometimes the objective function is the sum of the 2 norm and 1 norm to make a more robust fit :)
@gauthierruberti8065
@gauthierruberti8065 2 года назад
I really like this video
@virtually_passed
@virtually_passed 2 года назад
Thanks 🙏
@hcbotos
@hcbotos Год назад
Very nice video!
@Duiker36
@Duiker36 2 года назад
I was really hoping you'd follow through on that promise to explain why Least Squares is better than the other two approaches.
@virtually_passed
@virtually_passed 2 года назад
I intend to. Meanwhile, I've written quite a bit on this in other people's comments. :)
@loganreina2290
@loganreina2290 11 месяцев назад
The 2 norm at 11:50 or so should be squared. Very nice presentation
@virtually_passed
@virtually_passed 11 месяцев назад
Thanks! You're right! I've made a post about this.
@teaformulamaths
@teaformulamaths 2 года назад
Very elegant video, great concept to choose! Very 3b1b. Is there another standard to aspire to? 🤔
@virtually_passed
@virtually_passed 2 года назад
Thanks for the kind words. 3b1b is a hero of mine :)
@nikolaimikuszeit3204
@nikolaimikuszeit3204 Год назад
Very nice visual approach, but as a physicist, I am missing the motivation of " y errors only" vs "x and y errors". In other words, one could rotate the squares and go back to the ODR that is hinted to in the beginning and still get a least-squares method. (BTW unlucky choices: vector X and vector b) A video about ODR and/or SVD would be nice.
@lcfrod
@lcfrod 2 года назад
Excellent. Thank you.
@virtually_passed
@virtually_passed 2 года назад
:)
@asterixx6878
@asterixx6878 2 года назад
This is so much easier and more elegant to derive using linear algebra alone. There is no need to use Multivariable Calculus.
@virtually_passed
@virtually_passed 2 года назад
I agree it's beautiful and elegant to derive it using linear algebra alone! I actually just made a video doing exactly that :) ru-vid.com/video/%D0%B2%D0%B8%D0%B4%D0%B5%D0%BE-wJ35udCsHVc.html
@bitroix_
@bitroix_ 2 года назад
Love it!
@virtually_passed
@virtually_passed 2 года назад
:)
@gaganaut06
@gaganaut06 2 года назад
awesome thanks, can you do one with nonlinear curve fitting aswell
@virtually_passed
@virtually_passed 2 года назад
I actually intend to do just that! First I want to make a video on another proof of linear least squares using the column space of A. Then, if I have time I'll do one on orthogonal fitting using nonlinear least sqaures
@gaganaut06
@gaganaut06 2 года назад
@@virtually_passed awesome, waiting.....
@maatiger3009
@maatiger3009 Год назад
you are incredible ❤❤❤❤❤
@AhmedHan
@AhmedHan 2 года назад
Great video. Many thanks for the visualization of the problem. If I remember correctly, if we increase the length of the x vector, we could fit polynomials as well. Can you confirm this?
@virtually_passed
@virtually_passed 2 года назад
Correct. For example you could try and fit data to the function y = a + bx + cx^2 + dx^3 In this case the vector X would be: X = [a,b,c,d] The A matrix will also have more columns as well.
@guardianangel1337
@guardianangel1337 Год назад
I'll just buckle up and do the regression by hand. I guessed the value for b correctly. I don't need scary algorithms and maths.
@virtually_passed
@virtually_passed Год назад
Nothing wrong eyeballing it for simple cases :) Most programs have this inbuilt under the hood so you likely don't need to worry about the theory anyway :)
@rajasvlog7729
@rajasvlog7729 2 года назад
Nice video
@virtually_passed
@virtually_passed 2 года назад
Thanks!
@egoworks5611
@egoworks5611 2 года назад
Great video
@virtually_passed
@virtually_passed 2 года назад
Thanks :)
@kendakgifbancuher2047
@kendakgifbancuher2047 2 года назад
Virtually Based. Thanks for the video, subscribed
@virtually_passed
@virtually_passed 2 года назад
:)
@martinsanchez-hw4fi
@martinsanchez-hw4fi 2 года назад
Hi! Awesome video! Which tools do you use to create the interactive exercises?
@virtually_passed
@virtually_passed 2 года назад
I collaborated with someone who did most of the heavy lifting regarding the simulation. We used P5.js to make all the simulations. A link to his GitHub and his website is in the description :)
@avyakthaachar2.718
@avyakthaachar2.718 Год назад
Amazing ❤
@pyroMaximilian
@pyroMaximilian Год назад
You forgot to explain why we chose squares over linear distances.
@livedandletdie
@livedandletdie Год назад
I mean, it's just one more step than the Perpendicular lines. After all, if you have a distance, there's no need to square it, sure it keeps all distances positive, but so does the absolute in 2d...
@_earlyworm
@_earlyworm Год назад
this is not for beginners but for anyone who got a B in statistics this is better than 3b1b
@mrinfinity5557
@mrinfinity5557 2 года назад
okay, but why would the single line methods not work? especially the vertical line ones which would seem to do the same thing but without the squaring?
@virtually_passed
@virtually_passed 2 года назад
That's a great question! The short answer is that it does work! The 'verticle lines' method is actually used in some applications! If you go through the math, the objective function that we try to minimize here is the "1-norm" of the residual vector. This is because we try to minimize the sum of the absolute value of all of the residuals. In fact, sometimes the least squares method is used in conjunction with the 1-norm method in an attempt to make the fit more robust to outliers. If you want to see more, click on this amazing video by Steve Brunton: ru-vid.com/video/%D0%B2%D0%B8%D0%B4%D0%B5%D0%BE-GaXfqoLR_yI.html&ab_channel=SteveBrunton
@mattgsm
@mattgsm Месяц назад
Why can't you "divide" A transpose from both sides of that final equation?
@virtually_passed
@virtually_passed Месяц назад
Good question. When dealing with matrices we can't divide anymore. We need to multiply both sides by the inverse matrix. And this operation is only defined for square matrices. A^T isn't square in general (there could be more rows than columns or visa versa). However, in the very unlikely case that A happens to be square (ie there are just as many unique data points as unknowns) then you can inverse A^T and the pseudo inverse will collapse into the regular inverse of A. Hope that makes sense
@ryanchowdhary965
@ryanchowdhary965 2 года назад
Everyone is working hard eh.
@luckabuse
@luckabuse 2 года назад
How about minimize areas setwise discounting the intersections of the graphic squares? It would discount dense parts and should make a better fit
@virtually_passed
@virtually_passed 2 года назад
What an interesting idea! I don't know of any methods that do that. A consequence of this method is that a bunch of clumped points would have a similar weighting as a single point. That could be quite useful, actually! Interesting idea.
@jeffcarey3045
@jeffcarey3045 Год назад
Error function: *forms a parabola* Me: :o
@UshijPatel
@UshijPatel 3 месяца назад
can this idea be extended to fit any degree of polynomial function?
@virtually_passed
@virtually_passed 3 месяца назад
Yes! The A matrix just gets larger :)
@adolf_08
@adolf_08 2 года назад
Excelent video!
@luanmartins8068
@luanmartins8068 2 года назад
Do you have any recommendation of a material that connects this topic with QR Factorization?
@virtually_passed
@virtually_passed 2 года назад
Hi great question! I'm sure there are many resources online, but I use Chapter 10 of the book "Numerical Optimization" by Stephen J Wright. Good luck!
@luanmartins8068
@luanmartins8068 Год назад
@@virtually_passed Thanks! Also, very good video. I shared with my university colleagues. I really found it very well done
@virtually_passed
@virtually_passed Год назад
@@luanmartins8068 thanks!
@PhilipSmolen
@PhilipSmolen 2 года назад
Nice! What are the gray circles that appear in the background for about one frame at a time?
@virtually_passed
@virtually_passed 2 года назад
I use Microsoft onenote to do all the handwriting mathematical equations. Sadly whenever I press too hard on my touchscreen with my hand, onenote displays that annoying graphic. I tried to get rid of most of them but sadly I couldn't get rid of them all :(
@PhilipSmolen
@PhilipSmolen 2 года назад
@@virtually_passed Ah. I thought it was an easter egg or a subliminal message. Good luck in the contest.
@virtually_passed
@virtually_passed 2 года назад
@@PhilipSmolen thanks!
@jarikosonen4079
@jarikosonen4079 2 года назад
What about method of least circles...
@virtually_passed
@virtually_passed 2 года назад
Cool idea! The answer will actually be the same. Here's why: Instead of minimizing: r1^2 + r2^2 + ... + rn^2 You will be minimizing: (π/4)*r1^2 + (π/4)*r2^2 + ... + (π/4)*rn^2 (this is because a circles area is πD^2/4) (π/4) * (r1^2 + r2^2 + ... + rn^2) Notice this is just a scaled version of the same minimization problem from before, so the parabola will just be a bit less steep but will have the same optimum.
@agustinmartinez8980
@agustinmartinez8980 Год назад
Could this be done with circles, with the points making circles, tangent to the line of best fit?
@virtually_passed
@virtually_passed Год назад
Yes it can! More generally, you can use it to fit ellipses. You just need to do a clever transformation. Hint: let error = x^2+y^2
@benjaminmiller3620
@benjaminmiller3620 2 года назад
Does this naturally extend to higher dimensional points? How would one find the best fitting line to a 3d point cloud?
@virtually_passed
@virtually_passed 2 года назад
That's a great question! This method can indeed be extended to 3D data. Let's say you have n data points: (x1,y1,z1), (x2,y2,z2), (x3,y3,z3), .... , (xn,yn,zn) And let's say you wanted to fit the plane z = a + bx + cy to these data points. Here the unknowns are X = [a, b, c]. Just like in the 2D case you can construct a residual vector. But in this case, the residuals would be the error between the z coordinate on the plane and the z coordinate of the data. Ie ri = a + b*xi + c*yi - zi And so the A matrix will look like this: A = [ 1 x1 y1 1 x2 y2 1 x3 y3 1 x4 y4 .... 1 xn yn] and the b vector will look like this: b = [z1 z2 z3 z4 ... zn] Then you can use the same formula to find vector X = pinv(A)*b Hope that helps :)
@benjaminmiller3620
@benjaminmiller3620 2 года назад
@@virtually_passed The plane? So you'ed have to subsequently project the points onto the resulting plane and do a 2D "least squares" to get the line? There's no shortcut? Because that's what I was doing already, just the other way around. Project to the XY & XZ planes, Least Squares, Combine to 3d Line.
@virtually_passed
@virtually_passed 2 года назад
@@benjaminmiller3620 Hey mate, sorry I think I must have explained it poorly before. At no point is it needed to project the data to the XY and XZ planes. It's going to be hard to explain this without an image. Can you send an email to me at virtuallypassed@gmail.com and I'll reply with some images which will make that clearer :) In that email can you please provide me more details about the problem too? What is the exact form of the equation of the '3D line' you want to fit the data to? Is it actually a line? Or a surface?
@benjaminmiller3620
@benjaminmiller3620 2 года назад
@@virtually_passed A line. *r* = *r_0* + _t_ * *v* (I prefer the vector equation.) Not sure where you got "surface" from.
@virtually_passed
@virtually_passed 2 года назад
@@benjaminmiller3620 Hey Benjamin, I just replied to your email. I suggest using PCA. Details in the email :)
@Relkond
@Relkond 2 года назад
Some random advice - don’t tell us that you’re manipulating us by telling us that it’s a parabola. Instead, just suggest it’s shape resembles a parabola/hyperbola - get us thinking: ‘Huh - that’s interesting. Is it a parabola? is it a hyperbola?’ That has us thinking on it’s shape, and looking for what might be defining its shape -> that engages us in the lesson more than just monologging at us, and won’t anger some of us anywhere near as much as a bold statement of ‘I’m manipulating you for your own good’.
@virtually_passed
@virtually_passed 2 года назад
Ooo thanks for the pedagogy advice!
@Relkond
@Relkond 2 года назад
@@virtually_passed FWIW, The Action Lab recently did a video that involved putting superconductor into an induction heater. At face value, he appeared puzzled by the outcome, however, if you consider the whole video, he probably expected that outcome before he ever started filming -> it’s an example of engaging the audience by presenting them with something unexpected+unexplained. He’s doing much what you did vis-a-vis the parabola being true, but he put the focus on the subject without calling out that he was selectively feeding information to the audience. Good luck with your future ventures.
@octavylon9008
@octavylon9008 2 года назад
In my textbook and some other websites the gradient is given by this formula: b = S_{xy}/S_{xx} = (nsum(xy) - sum(x)sum(y)) / (nsum(x²) - (sum(x))²) That is not the same as the formula here (sum(xy)/sum(x²)) . Why ?
@virtually_passed
@virtually_passed 2 года назад
Hi thanks for the question. The formula you are referring to finds the value of 'b' that fits the line y=a+bx. The formula that I derived at 8:27 finds the value of 'b' that fits the line y=bx. This is why the formula is different. However, later on in my video (16:00) I derive an even more general formula for fitting any polynomial with any amount of unknowns (not just lines!). If you were to use that formula for the special case of a line y=a+bx you'll get the same answer as the one you provided.
@MattBell
@MattBell Год назад
How you not gonna name your collaborator at the start?
@Zwerggoldhamster
@Zwerggoldhamster 2 года назад
What I don't understand: doesn't the line depend on the orientation of the coordinate system? I don't know if it does, but I would expect so, and - graphically - that bugs me. I know it makes sense to square the errors (parallel to the y-axis) when dealing with a data set from a measurement. But when I draw points on the floor and ask you, what the best line through those points is, it shouldn't depend on the coordinate system.
@satyampanchal-1016
@satyampanchal-1016 2 года назад
i guess once you are IN a coordinate system, then the corresponding x,y data will give you unique values of a and b, changing the coordinate system will change the x, y and also it will change the corresponding a and b...so Different coordinate systems will give you different a and b. Making the line Fit every time. So it is in this sense, you will get a fit always independent of what coordinate you choose. But you will have to CHOOSE first in order to proceed. Choice IS independent.
@virtually_passed
@virtually_passed 2 года назад
Hey, that's a really interesting question! If I understand you correctly, you're claiming that if you have another axis x', y' that's 10 degrees rotated clockwise from the traditional axis x,y then the fitted curve will be slightly different. Is that correct? I haven't done the math on it, but I strongly suspect you're right. But consider trying to fit the data with a parabola y = a+bx+cx^2 instead. In this case, the parameters the LS fitting would need to find are (a,b and c). However, in the rotated coordinate system, if you tried to fit the parabola y' = a' + b'x' + c'x^2, then you'll find there are no values of (a', b' and c') that could ever make these two parabolas look the same! And that's because a rotated parabola has an entirely different equation in the original coordinate system. So when you think about it this way, it seems quite reasonable, in my subjective opinion, that a different coordinate system can make slightly different fits. In which case, you would need to define your coordinate system first, and then perform the fit :) Hope that helps :D
@Zwerggoldhamster
@Zwerggoldhamster 2 года назад
@@virtually_passed Haven't done the math either, but that's just what I suspected. Maybe squaring the perpendicular distances to the line and minimizing that sum would give you the same line always, independent of where the coordinate system is.
@MekazaBitrusty
@MekazaBitrusty Год назад
Yep, I got nothing. Absolutely no idea why you use the are of a square rather than just the length of the line. Then when you started using matrices, I was lost.
@idjles
@idjles 2 года назад
You could have completed the square instead of calculating de/db. You would have found b without calculus.
@virtually_passed
@virtually_passed 2 года назад
You're absolutely right!
@idjles
@idjles 2 года назад
@@virtually_passed and if you replaced all the sums with Sum x^2, Sum xy and sum Y^2 then you could have done two things - solved everything without matrices, and also shown how incredibly efficient this algorithm is because you can incrementally add and remove points from those sums.
@virtually_passed
@virtually_passed 2 года назад
@@idjles Indeed the example I showed with 2 unknowns (a and b) can be solved without matrices. However, the method I used to solve it can be applied to a polynomial with 'n' parameters! Deriving a solution for 'n' unknowns without matrices will be very very hard and messy :)
@kristyandesouza5980
@kristyandesouza5980 Год назад
Well, i think i don't have "basic highschool calculus"
@MCLooyverse
@MCLooyverse 2 года назад
You have `invert (transpose A * A) * transpose A`... shouldn't that simplify to `invert A`? The inverse of a product is the product of the inverses, but in the opposite order, then the `invert (transpose A)` would cancel with `transpose A` by associativity.
@virtually_passed
@virtually_passed 2 года назад
That's a great question! If I understand your question correctly you are saying the following, right? X = inv(A^T A) A^T b =inv(A) inv(A^T) A^T b =inv(A) I b =inv(A) b This can only be true if A is a square matrix! Because the rule inv(AB) = inv(B) inv(A) only applies if A and B are square matrices - the traditional inverse is only defined for a square matrix. Hope that helps! :)
@MCLooyverse
@MCLooyverse 2 года назад
@@virtually_passed Ah! I was thinking about that, but I forgot that A^T * A would be square (and possibly invertible), even if A isn't.
@virtually_passed
@virtually_passed 2 года назад
@@MCLooyverse correct :)
@theastuteangler
@theastuteangler 2 года назад
how do we know that the curve is a straight line, that the function of the data is linear? Seems like it take on a logarithmic appearance. Few equations in the real world are linear. Seems like this could be an example of the problem of "lying with statistics".
@virtually_passed
@virtually_passed 2 года назад
Hi that's a really great question! The form of the equation that you want to fit the data to has to come from some external information about the system you're analyzing. Typically engineers or physics have a model of the thing they're trying to analyse. For example, if this data was force Vs distance for a spring then the model will probably be linear, or a cubic. If it was population Vs time then you'll use an exponential. You might be tempted to avoid this problem by trying to fit a curve with many many unknown parameters (perhaps by fitting a polynomial of degree 100 or something). But this is a bad idea because then you will just be overfitting. If you genuinely know nothing about the data you're measuring, and so you have no model (eg you're studying a part of the human brain or something) then there are other things you can do, but that goes beyond least squares.
@theastuteangler
@theastuteangler 2 года назад
@@virtually_passed awesome, thank you for the detailed and prompt reply! Perhaps my question could be material for your next video? I just found your channel with this video, excited to binge.
@jeffreyblack666
@jeffreyblack666 2 года назад
Saying it can be "easily and efficiently be implemented in software" is quite misleading by providing an example of a function. A single function call can be incredibly complex and inefficient. All that demonstrates is that it can be easily implemented.
@farpurple
@farpurple 2 года назад
until u didnt implement matrixes it was understandable, then i tryied to continue with your dream, but lost, i need learn more math..
@virtually_passed
@virtually_passed 2 года назад
Thanks for the comment! Yeah, linear algebra can be quite tough. As long as you understood the first part though (solving for 1 unknown), that's the most important thing! The other half of the video is a way to solve for 'n' unknowns and it's basically the same idea :)
@Kenya_Berry
@Kenya_Berry 2 года назад
How did I get here from watching animators
@virtually_passed
@virtually_passed 2 года назад
\_o_0_/
@Nathouuuutheone
@Nathouuuutheone 2 года назад
Why did you not show the vertical and perpendicular options before spending multuple minutes essentially repeating that the square option was the best? Also, why are the square drawn the way they are and not some other way? Why use the vertical as a basis and not an horizontal or a perpendicular? I'm almost halfway through the video and I feel like I'm getting dragged through the problem and its "best" solution instead of being told about the approach to the problem. I feel like I'm not being allowed to see the steps that get us to the answer, I'm just sitting through long praise of the good answer. Honestly, why are we proving that squares yield parabolas? There is no intuitive reason why we're talking about parabolas by that point. And that's multiple minutes spent listening to maths I had no clue why I was listening to. And the rest of the video is more maths that was more like being told how to write an algorithm than why use that algorithm.
@virtually_passed
@virtually_passed 2 года назад
Hey, thanks so much for your comment. I really appreciate the feedback. I think I'll create another video that will describe the differences in these fitting methods in more detail. In short, there are pros and cons for each of the proposed fitting methods you've proposed. Ultimately, the 'best' method depends on the type of problem you have. However, the point of this video was to explain what the ordinary least squares method is, and to provide just a bit of motivation as to why it's so widely used. It's widely used because it's 1) very computationally efficient 2) simple to implement in software and 3) results in a convex optimization problem (the parabola only has one minimum). I hope that helps explain things :)
@jodyhensley9796
@jodyhensley9796 2 года назад
promosm
@virtually_passed
@virtually_passed 2 года назад
What does that mean? :}
@ABaumstumpf
@ABaumstumpf 2 года назад
Nah, i would say the problem is NOT well-defined as for that you MUST define what the problem and the data actually is. The least-square regression is one metric that will give you a linear fit. Is it a good fit? Maybe. If your data-set is a simple F(x)=y and x is precise. But if you got 2D data (both X and Y have errors) then the 2nd method (orthogonal regression) would offer a more useful result. It is not hard to find a "good fit" for some data with a particular method, but it is hard to use the CORRECT fit for the data. Given a list of say 20 points it is easy to get a least-square linear fit, it is also easy to get a 19th order polynomial fit. But it might very well be that the data actually comes from a 3rd order phenomenon.
@virtually_passed
@virtually_passed 2 года назад
Hey thanks so much for your comment. I agree the least squares is not the only method that can be used to fit data, and I also agree there are several downsides to least squares: 1) it's very sensitive to large outliers (since it squares the error) so a 1-norm regularisation term is sometimes added to make the fit more robust, and 2) it's easy to overfit with least squares (as you mentioned, fitting at 19th order polynomial to 20 data points), etc The point of this video was not to provide a rigorous comparison of all of the possible fitting methods - there are many more including nonlinear least squares! The main intention was to show the derivation behind linear least squares and why it's so often used and so computationally efficient. I fully agree with you that there are other fitting methods which are better suited to specific data types :) I hope that makes sense :)
Далее
Least Squares Formula PROOF
6:21
Просмотров 9 тыс.
The Mathematics of String Art
10:36
Просмотров 538 тыс.
The Shadowy World of Umbral Calculus
15:01
Просмотров 124 тыс.
Percolation: a Mathematical Phase Transition
26:52
Просмотров 359 тыс.
The Boundary of Computation
12:59
Просмотров 1 млн