No video :(

Support Vector Machines Part 2: The Polynomial Kernel (Part 2 of 3)

Подписаться 1,2 млн

Просмотров 336 тыс.

50% 1

Support Vector Machines use kernel functions to do all the hard work and this StatQuest dives deep into one of the most popular: The Polynomial Kernel. We talk about the parameter values and how they calculate high-dimensional coordinates via the dot-product and high-dimensional relationships
NOTE: This StatQuest assumes you already know about...
Support Vector Machines: • Support Vector Machine...
Cross Validation: • Machine Learning Funda...
ALSO NOTE: This StatQuest is based on...
1) The description of Kernel Functions, and associated concepts on pages 352 to 353 of the Introduction to Statistical Learning in R: faculty.marshal...
2) The Polynomial Kernel is also based on the Kernel used by scikit-learn: scikit-learn.o...
For a complete index of all the StatQuest videos, check out:
statquest.org/...
If you'd like to support StatQuest, please consider...
Buying The StatQuest Illustrated Guide to Machine Learning!!!
PDF - statquest.gumr...
Paperback - www.amazon.com...
Kindle eBook - www.amazon.com...
Patreon: / statquest
...or...
RU-vid Membership: / @statquest
...a cool StatQuest t-shirt or sweatshirt:
shop.spreadshi...
...buying one or two of my songs (or go large and get a whole album!)
joshuastarmer....
...or just donating to StatQuest!
www.paypal.me/...
Lastly, if you want to keep up with me as I research and create new StatQuests, follow me on twitter:
/ joshuastarmer
#statquest #SVM #kernel

Опубликовано:

26 авг 2024

Ссылка:

Скачать:

Готовим ссылку...

Добавить в:

Мой плейлист

Посмотреть позже

Комментарии : 426

@statquest 2 года назад

Support StatQuest by buying my book The StatQuest Illustrated Guide to Machine Learning or a Study Guide or Merch!!! statquest.org/statquest-store/

@davidonwuteaka2642 Год назад

How do I get it from Nigeria. I'd love to.

@statquest Год назад

@@davidonwuteaka2642 Unfortunately I don't have distribution of physical (printed) copies in Nigeria, but you can get the PDF.

@davidonwuteaka2642 Год назад

Yes, I have been trying to but the site kept rejecting my card. Thanks for your reply.

@statquest Год назад

@@davidonwuteaka2642 Bummer! I'm sorry to hear that.

@stanlukash33 3 года назад

I will make it easy for you guys: 3:38 - BAM 4:49 - DOUBLE BAM 5:54 - TRIPLE BAM

@statquest 3 года назад

Just the hits! BAM! :)

@MrPikkabo 3 года назад

Thanks I know statistics now

@RHONSON100 2 года назад

Your videos should be mandatory tutorial for Data Science/ ML courses in all the Universities. Students throughout the world would get benefited after watching the best ML video.Hats off to you great Josh Starmer..............

@statquest 2 года назад

Wow, thanks!

@rameshmitawa2246 2 года назад

Not mandatory, but my prof recommends this channel after every slide/lecture.

@statquest 2 года назад

@@rameshmitawa2246 That's awesome!

@tinacole1450 Год назад

I believe because most instructors don't teach it. They simply give information ....Josh actually explains difficult concepts in a simple way.

@madhuvarun2790 3 года назад

Dude, You are amazing. The best tutorial on SVM. I have searched the entire Internet to understand but couldn't. Please continue to make videos.

@statquest 3 года назад

Thanks, will do!

@atharvapatilsnoozy Год назад

Best machine learning playlist I have encountered on the RU-vid . The animations and your funny way of teaching makes it easy to understand concepts. The amount of work you put to create these videos deserves great appreciation. I would definitely recommend to go through the videos for anyone who is reading this comment.

@statquest Год назад

Glad you like them!

@jacobwalker6891 2 месяца назад

I have read and looked at most recommended books and videos on kernels and whilst somewhat familiar with the math, never truly understood the principles. Statquest actually makes complex topics simple, arguably one of the best if not the best teacher on youtube and definitely the best stat explanations. Thanks Josh much appreciated 👍

@statquest 2 месяца назад

Thank you very much! :)

@marcoharfe9812 4 года назад

I want to thank you so much for all your videos. I was lost in a forest of vectors matrices and greek letters when I heard about these topics in lecture and I did not understand a thing. As I was practising for the exam, I discovered your videos and now I do actually understand what is happening. Really love the practical, example driven approach!

@statquest 4 года назад

Awesome!!!! Good luck with your exam and let me know how it goes. :)

@itsfabiolous 11 месяцев назад

Bro you're just a blessing. Never stop with the dry humor. Lot's of love for you!

@statquest 11 месяцев назад

Thank you! Will do!

@priyangkumarpatel9317 4 года назад

This is one of the best explanation for support vector machines... If anyone is interested in why dot products are integral to the idea of SVM, please refer to Professor Wilson's MIT lecture on SVM... It is another great explanation for SVM...

@statquest 4 года назад

Thanks! :)

@amalboussere9270 4 года назад

thank you a lot you are such a big help in this harsh student world god bless you .

@statquest 4 года назад

I'm glad you like my videos! :)

@palashchandrakar1112 4 года назад

@@statquest we just don't only like them we love your videos XOXO

@leif1075 3 года назад

@@statquest this doesnt show where on esrth you dsrive that formula from..WHY do you multiply a times b and then add r .why not multiply all three or add all three..see what I mean? I don't see how anyone could figure itnout..not enough info here to derive it

@606Add 4 года назад

You are videos are simply amazing! And the level of abstraction is right at the sweet spot! Thank you for the extremely thoughtful and precise illustrations!

@statquest 4 года назад

Thank you very much! :)

@jonathannoll3386 4 года назад

My man. I'm so happy I have my presentation about SVM's after your uploads... Keep up the great work!

@statquest 4 года назад

Awesome! :)

@kwok9298 2 года назад

I really appreciate how the way it is explained. Please keep on the good job!

@statquest 2 года назад

Thank you!

@deashehu2591 4 года назад

I have grown to love your little songs. They sound like Pheobe's songs!!! I have a little question , what do you use for visualization?

@statquest 4 года назад

Thanks! I draw all the pictures in Keynote.

@gargidwivedi7700 4 года назад

That's exactly what I and my sister agreed just before we saw your comment! haha.

@statquest 4 года назад

@Leila Mohammadzadeh Google "svm lagrange dual" and you will see how SVM uses the dot products to find the classifier.

@MrZidane1128 3 года назад

First of all, thanks for your explanation, after plugging two data points into polynomial kernel function a and b then get the value 16,002.25, then you said we get higher dimensional relationship. Could you elaborate further what "relationship" did you refer to based on the value 16,002.25? Sorry I was not quite sure about that

@statquest 3 года назад

In some sense the "relationships" are similar to transforming the data to the higher dimension and calculating the distances between data points.

@vedgupta1686 2 года назад

@@statquest But the value 16002.25 alone is a 1-D data point. How do you suppose that helps us classify? Am I missing something?

@statquest 2 года назад

@@vedgupta1686 Think of that number is a loss value that is used as input for an iterative optimization algorithm like gradient descent.

@HeduAI 2 года назад

I thought the whole point of using the kernel trick was to save on the computation cost. If we are using an iterative algorithm anyway, how is that better than transforming the data?

@statquest 2 года назад

@@HeduAI Either way you would still have to use an iterative procedure. So that computation is fixed.

@hayskapoy 4 года назад

Would love to see more math after seeing the big picture behind these algorithms 😄

@ahming123 4 года назад

What do you mean by high dimension relationship??

@huhuboss8274 4 года назад

like the distance but in higher dimensions

@Actanonverba01 4 года назад

a synonym for 'high dimension' is many features or variables. Relationship think connection(s). So if we have a high D. relationship, we have a set of many variables that are connected by some idea or mathematical formula. Does that help?

@BrandonSLockey 4 года назад

watch first video (Part I)

@leif1075 3 года назад

@@Actanonverba01 that's what I thought but that is irrelevant here because we only have obe variable with two possible categories of values. But of course we can add more connecfions and variables which I think is what you are alluding to

@clapdrix72 2 года назад

@@leif1075 It's not actually what he means and it's not irrelevant. High dimensional space means we take our original input feature space (in this case just X1) and transform it into higher dimensional space by "making up" new dimensions that are functions of our original dimensions (X1) so that the data is linearly separable in that new space. The pair wise relationships (aka similarity) are the dustances between the observations projected into that higher dimensional space (usually referred to as latent space). So it doesn't matter how many features you have in your original dataset nor how many outcome classes you have - those are irrelevant to the SVM algorithm mechanics, they only change the scale.

@chenghuang4724 Год назад

Sir, this is the best video for explaining the Kernel!

@statquest Год назад

Glad you think so!

@flaviodefalcao 4 года назад

It is awesome and satisfing to be able to learn an intuition with these videos and reading a textbook understanding everything. THANKS

@statquest 4 года назад

Awesome! I'm glad the videos are helpful! :)

@flaviodefalcao 4 года назад

@@statquest BAM!!!

@tymothylim6550 3 года назад

Thank you for this video! It was very helpful in terms of understanding the details of how the kernel function leads to certain equations that need to be solved to obtain the relevant Support Vector Classifier!

@statquest 3 года назад

Bam! :)

@rrrprogram8667 4 года назад

After a lonnnnggg waitttt..... MEGAA MEGAAA MEGAAAA BAMMM is back

@statquest 4 года назад

Ha! Thank you! :)

@johnjung-studywithme Год назад

This is how concepts should be introduced to students.. makes so much more sense

@statquest Год назад

Thank you! :)

@billykristianto3818 7 месяцев назад

Thank you very much, the explanation is easier to understand compare to my class!

@statquest 7 месяцев назад

Glad it helped!

@trashantrathore4995 2 года назад

Earlier i had an intuition of all Algos which was incomplete and which could not be explained to others, Concepts are getting cleared now. Thanks STATQUEST Team, Josh Starmer, will contribute ASA i get a job in DS field.

@statquest 2 года назад

bam! :)

@mahfuzurrahmansazal3974 4 дня назад

Came here for the SVM, stayed for BAM!!! Double BAM!!!!

@statquest 4 дня назад

That's awesome! You made me laugh.

@evelillac9718 3 года назад

You literally saved my homework with your videos

@statquest 3 года назад

Bam!

@edmondkeogh4057 3 года назад

the beep boop thing was hilarious

@statquest 3 года назад

@thawinhart-rawung463 Год назад

Good job Josh

@statquest Год назад

@sinarb2884 3 года назад

I could be wrong, but I think there is a slight mistake in this video. The kernel function should be of the form (ab-1/2)^2. This is because the support vector classifier is essentially thresholding based on whether x>y or not. Let me know please if I am wrong. And, thanks for your cool videos.

@statquest 3 года назад

Most people define it the way I defined it in the video, (ab + r)^d. For more details, see: en.wikipedia.org/wiki/Polynomial_kernel and Page 352 of the Introduction to Statistical Learning in R.

@technojos 3 года назад

Thanksss Josh Starmer.I am facinated because of your videos. Please make a video about how 16002.25 is used bam?. Moreover I think that you can make video playlist about how machine learning algorithms has coded double bamm . Keep going man, we love you triple bamm!!!

@statquest 3 года назад

Great suggestions!

@kevinarmbruster2724 3 года назад

@@statquest How is the relationship of 16.002,25 to be interpreted? I understood that if we transfer everything to the higher dimension we can solve it, but I did not understand the part about relationships between the points and how they help.

@statquest 3 года назад

@@kevinarmbruster2724 We plug the relationships into an algorithm that is similar to gradient descent and it can use them to find the optimal classifier. However, the details are pretty complex and would require another video.

@nightawaitsusall9607 4 года назад

You my friend are a champion. Yes.

@statquest 4 года назад

Thank you! :)

@benardmwanjeya8371 4 года назад

God bless you Josh STARmer

@statquest 4 года назад

Thank you very much! :)

@harithagayathri7185 4 года назад

Great explanation 👍 Thanks a ton Josh!!. But, a bit confused here on how to calculate appropriate 'r' coefficient for the eqn.I understand that 'd' value is calculated using Cross Validation

@statquest 4 года назад

'r' is also determined by cross validation, but I am under the impression that it doesn't have as much impact as 'd'. It basically scales things by a constant, rather than adding extra dimensions.

@thememace 3 года назад

@@statquest What's the point of setting r anyway since it later gets completely ignored?🤔

@statquest 3 года назад

@@thememace I'm not sure

@rohanpatel702 2 года назад

@@thememace it doesn't get completely ignored. When r=1/2, the math works out such that the x-axis doesn't get scaled at all. But when r=1, the x-axis gets scaled by sqrt(2). Even though the third element of the vectors combined by dot product is a constant (and thus ignored), the choice of r still affects how the dot product evaluates because of how it changes the first element of each vector.

@muhtasirimran 2 года назад

Mr. Starmer almost unconsciously changing machine Learning's future 😀

@statquest 2 года назад

@dok3820 2 года назад

Thank you Josh. Just..thank you

@statquest 2 года назад

@manasadevadas8685 3 года назад

First of all thankyou so much for explaining with such amazing illustrations. One doubt, how can we actually use relationship between points to find the support vector classifier?

@statquest 3 года назад

Unfortunately that's a difficult question to answer and I'd have to dedicate a whole video to it. However, the simple answer is that it uses a method like Gradient Descent to find the optimal values.

@manasadevadas8685 3 года назад

@@statquest Thanks for the response! Hopefully later you'd dedicate a whole video to it :)

@NathanPhippsONeill 4 года назад

Amazing vid! Thanks helping me prepare for my Machine Learning exam 😁

@statquest 4 года назад

Good luck and let me know how it goes. :)

@NathanPhippsONeill 4 года назад

@@statquest It went well for a difficult exam. BUT I had a lot to write about thanks to this channel. Appreciate it ❤️

@statquest 4 года назад

@@NathanPhippsONeill Hooray!!! That's awesome and congratulations. :)

@aryamahima3 2 года назад

@5:09, u said that we need to calculate dot product between each pair of point. How do we use this dot product further? could u please clear to me, u r the only person on whole internet who can clear this. :D

@statquest 2 года назад

We use it as input to an iterative optimization algorithm similar to gradient descent. For details on gradient descent, see: ru-vid.com/video/%D0%B2%D0%B8%D0%B4%D0%B5%D0%BE-sDv4f4s2SB8.html

@aryamahima3 2 года назад

@@statquest thank u so much ☺️

@manaspatil4316 3 года назад

God bless you !!!

@statquest 3 года назад

@axa3547 3 года назад

machine learning algorithimss!!! is it just me or other who has to learn these again n again to fill the gap in knowledge

@statquest 3 года назад

bam!

@yulinliu850 4 года назад

Awesome! Josh is back.

@statquest 4 года назад

@tinacole1450 Год назад

Does anyone laugh at how silly yet genius Josh is? Loved the robot.. I rewinded to do the robot.

@statquest Год назад

You are my favorite! Thank you so much! I'm glad you enjoy the silly sounds.

@eric752 2 года назад

One suggestion: if at the beginning, if the all the topics are listed in a logical way, it would even better. Big thanks for the videos, really appreciate it 🙏

@statquest 2 года назад

Thanks!

@eric752 2 года назад

@@statquest thank you

@temesgenaberaasfaw5076 4 года назад

best tutorial for SVM , YOU DID IT THANKS

@statquest 4 года назад

Thank you! :)

@commentor93 2 года назад

I've understood more than I ever expected to understand in this topic all thanks to your videos. But now I've stumbled a bit: How do you solve a constant like the one in 5:50? Or what does solving mean in that context now that it isn't a formula? Could you please expand on that?

@statquest 2 года назад

Think of it as a loss value, and it is something we try to optimize with an iterative algorithm that is similar to Gradient Descent: ru-vid.com/video/%D0%B2%D0%B8%D0%B4%D0%B5%D0%BE-sDv4f4s2SB8.html

@preeethan 4 года назад

Amazing explanation:) We find the High Dimensional Relationship between 2 points to be 16002.25. Practically what do we do with this value.? How do we find the Support Vector Classifier with this value.?

@statquest 4 года назад

It's quite complicated - way too complicated to be described in a comment.

@preeethan 4 года назад

StatQuest with Josh Starmer Okay. I love all you videos, especially your intro songs! Great work keep it going Josh :)

@sanjivgautam9063 4 года назад

I want this answer too!

@balasubramanian5232 4 года назад

@@statquest I want answers for the question. It'll be helpful if you could share links to resources on this

@statquest 4 года назад

@@balasubramanian5232 Google "svm lagrange dual" and you will have lots and lots of resources.

@tuongminhquoc 4 года назад

First comment! I have turned on notification for your videos. I love all of your videos!

@statquest 4 года назад

Awesome! Thank you! :)

@harshitsati 3 года назад

Thank you angel

@statquest 3 года назад

bam! :)

@alternativepotato 3 года назад

i love u my man you really are a life saver. Just because of that i am gonna buy a tshirt

@statquest 3 года назад

BAM! Thank you very much! :)

@muhammadavimajidkaaffah7715 4 года назад

SVM for multiclass please, I like your video so much.

@shahbazsiddiqi74 4 года назад

waited too long... Thanks a ton

@vincent-paulvincentelli2627 3 года назад

Great video ! It would be very nice to have such an intuitive one for kernel PCA :)

@statquest 3 года назад

I'll keep that in mind.

@TaylorSparks 2 года назад

bam. love it homie. keep it up

@statquest 2 года назад

Thank you!

@sornamuhilan.s.p 4 года назад

John Starmer, you are a genius sir!!

@statquest 4 года назад

Thank you! :)

@iisc2022 Год назад

thank you

@statquest Год назад

Welcome!

@jhfoleiss 4 года назад

Great explanation, thanks! One question: what happens when a and b are vectors? I understand that in this quest you wanted to give a simple example (with a single feature) to make things clear. If the answer to this question is in another quest, i'll gladly wait for it :)

@statquest 4 года назад

If 'a' and 'b' are vectors (because you have measured more than one thing per observation), then you just multiply a^T b, where a^T = a transpose.

@primeprover 4 года назад

@@statquest Doesn't that assume all the features have the same impact on the outcome? I would have thought that some form of weighting in the sums in the dot product of a and b would be necessary.

@statquest 4 года назад

@@primeprover That's a good point. Like PCA, SVMs are sensitive to scale, so the first thing you would do is normalize all of the variables you've measured.

@primeprover 4 года назад

@@statquest Surely more than just normalization is needed? If you provide two normalized variables to a linear regression model they will each get their own coefficient. One could be 1 and the other 0.1. As far as I can see we seem to be giving all features a coefficient of 1 in the models you described? I would have thought that all but one of the additional features(the other would be 1) would need an extra model parameter to scale it in relation to the others.

@statquest 4 года назад

@@primeprover I think conceptualizing SVMs in terms of linear or logistic models can be a little misleading. The choice of the parameters for the kernels, unlike linear or logistic regression, do not represent a relationship between the data and the classification. All the SVM is doing is applying relatively arbitrary transformations to the data to increase the dimensionality in a way that might be helpful for separation.

@harshitamangal8861 4 года назад

Hi Josh, the explanation is amazing. I had a question- you said that the equation (a*b + r) ^d is used for finding the relationship between two points, how is this found relationship used for getting where the Support Vector Classifier?

@statquest 4 года назад

Unfortunately the details of how it is used would require a whole video and I can't cram it into a comment. However, making the video is on the to-do list.

@zheyuanzhou3165 4 года назад

super clear tut. Thank you very much! But as a non-English native speaker, I am a little confused, what is BAM trying to express?

@statquest 4 года назад

ru-vid.com/video/%D0%B2%D0%B8%D0%B4%D0%B5%D0%BE-i4iUvjsGCMc.html

@zheyuanzhou3165 4 года назад

@@statquest A tut for BAM! cool lol

@L.-.. 4 года назад

After we find the dot product, with that value how we decide whether the new sample belongs to positive class or negative class? Please clarify Josh.

@statquest 4 года назад

It's a little too much to put into a comment. The purpose of the video was only to give insight into how the kernel works, not derive the math.

@dimitrismarkopoulos3964 2 года назад

First of all congratulations! your videos are super explanatory! One question: The equation of the polynomial kernel has always the same form?

@statquest 2 года назад

As far as I know. However, the variables might have different names.

@tsunningwah3471 Месяц назад

amazing

@statquest Месяц назад

Thanks!

@marijatosic217 4 года назад

Thank you for the video! And now, what does this number 16002.25 tell us? :D How will we know what the right dosage?

@statquest 4 года назад

That's just an example of the kind of values that are used by the kernel trick to determine the optimal placement of the support vector classifier.

@MrWincenzo 4 года назад

since the kernel requires to calculate the dot product for each couple of points, suppose we have 10 points when we do it just for each point with respect to the others and itself we should obtain 10 different dot products for each single point. Which one of those 10 dot products become the new "y" dimension of the point?

@statquest 4 года назад

None of them end up being the new "y" dimension. The kernel trick works without having to make that transformation. We use the transformation to give an intuition of how the process works, but the kernel trick itself bypasses the transformation. This is the "kernel trick", and I mention it in the first video in the series on SVMs: ru-vid.com/video/%D0%B2%D0%B8%D0%B4%D0%B5%D0%BE-efR1C6CvhmE.html

@MrWincenzo 4 года назад

@@statquest yes i misunderstood before, now i got it: when we plug the values into the polynomial expression is equivalent to calculate the dot product in higher dimensions. And since the SVM only depends on those dot products among point we have just "improved" the classification mimicking the dot product in higher dimensions as musch as infinite like with RBF. Still thank you for all your efforts and your gentle replies to our questions. Regards.

@rajdeepkumarnath8944 2 года назад

I once knew a kernal, whose name was Fred, But thats not the path we are gonna tread. (thats a better song Josh :D )

@statquest 2 года назад

bam!!!

@beshosamir8978 2 года назад

quick question : why it is useful to calculate the relationships between every two point regardless in any dimensions , how it can be useful for calculating the decision boundary ?

@statquest 2 года назад

SVM's are optimized using an iterative algorithm that is similar to Gradient Descent, and the relationship values are essentially the "loss" values and help move the SVC to the correct spot.

@beshosamir8978 2 года назад

@@statquest So how to know That Is the best dimension i'm looking for according the relationship between every two points?

@statquest 2 года назад

@@beshosamir8978 www.cs.cmu.edu/~epxing/Class/10701-08s/recitation/svm.pdf

@abrahamjacob7360 4 года назад

Josh, this is a great video. One question on the Polynormal Kernal derivation. So the original problem was to find a classification point to find drug usage limits that cures or doesnt cure the disease. When we increased the value of 2, you mentioned it introduced a second dimension. I understood, how squaring the value helped to find a better Marginal classifier line, but ideally there is no meaning to the y axis here right, because the case still remains the same. We are just finding if the drug usage had a positive or negative impact. we could still use the y axis to determine its efficity, but if we increase the value to 3, what would Z axis represent here. Sorry if the question was confusing

@statquest 4 года назад

The new dimensions don't mean anything at all - they are just extra dimensions that allow us to curve and bend the data so that we can separate it. The more dimensions, the more we can curve and bend the data.

@DeepakSingh-fo2wm 4 года назад

I am still not clear what happened after finding a relationship in higher dimension like in the video what happened after finding 16002.25 ?? Can you please add a short video over the same if possible.

@statquest 4 года назад

It would be a long video, but it's on the to-do list.

@ronitganguly3318 2 года назад

The high dimensional relationship you calculated at the end is a number which tells what exactly? How does it help to pseudo transform into higher dimensions?

@statquest 2 года назад

Are you familiar with Gradient Descent? ru-vid.com/video/%D0%B2%D0%B8%D0%B4%D0%B5%D0%BE-sDv4f4s2SB8.html SVMs use a different algorithm, but the idea is similar, and you can think of the numbers, like 16002.25 as values that the algorithm is trying to optimize.

@marcelocoip7275 2 года назад

Visually thinking about the last set of data: if you can draw a line to separate the data if you square each observation to the y-axis, then you can draw a line independently of the scale/ratio of the x-axis. Then I see is that the only thing that it is adding "solving/math value" is increasing the order of the xi-axis to fit a hyperplane (d value). What r contributes to arrive to a better solution?

@statquest 2 года назад

I don't think it adds much.

@donaldmahaya2689 4 года назад

I'm always left with the illusion that I understood what you just said.

@statquest 4 года назад

@donaldmahaya2689 4 года назад

@@statquest Re-watched it and I did get it after all. BAM!

@berknoyan7594 4 года назад

Hi Josh,Thanks for the video. You are helping me a lot. I have just one question. What do you mean by "high dimensional relationship"? Because It can be achieved by any 2 numbers that has multiplication result of 126 which is Infinite.Its just a dot product of two 3 dimensional data.Cross Validation uses misclassification rate to select best r and d as far as i know. Do CV use these numbers on any calculation?

@statquest 4 года назад

Cross Validation does not use these high-dimensional relationships. Instead, the algorithm that finds optimal fits, given constraints (like the number of misclassifications you will allow) uses them. Although the dot product seems like it would be too simple to use, it has a geometric interpretation related to how close the points are to each other. For more details, check out the Wikipedia article: en.wikipedia.org/wiki/Dot_product

@XoXkS 4 года назад

Another Great thing, besides the astonishing easy explanations, is the way you talk. You talk so slow, that I can watch the easy parts easily on 1.5 Speed and the hard parts on normal speed. Most people, when they talk slow, talk slow by making long pauses in between words, this way watching at a higher speed sounds very unnatural. You sound just fine on normal and 1.5 Speed!

@statquest 4 года назад

bam!

@p-niddy 2 года назад

What does the "relationship" between two points actually signify? Based on this video, it looks like a number without much meaning that you can map onto the graph.

@statquest 2 года назад

It has no use for us. However, the algorithm that finds the optimal support vector classifier can use those values to do it's job.

@annusrivastava4425 4 года назад

To find the value of r and d, can we use GridSearhCV as well?

@statquest 4 года назад

Yes. GridSearchCV is just a way to do CV.

@harishh.s4701 2 года назад

Hi, Thanks a lot for your content. It is very easy to understand and I appreciate your way of explaining things. I had one doubt. Can you please explain how does Cross-validation help to determine the optimal degree of the polynomial kernel used in SVM's?

@statquest 2 года назад

I do that in this video: ru-vid.com/video/%D0%B2%D0%B8%D0%B4%D0%B5%D0%BE-8A7L0GsBiLQ.html

@davydfridman3001 Год назад

Does anyone have a link to a good article that explains all the math behind the kernels?

@geo1997jack 3 года назад

I did not understand what that 16000 value means or how it helps us. Could you please clarify? Everything else was crystal clear :)

@statquest 3 года назад

It's used as a measure of the relationship between two points. Once we calculate the relationships between all of the points, they are used in a method similar to Gradient Descent to find the optimal classifier.

@stoicism-101 2 года назад

Dear Sir, Kernels are basically used for finding the relationship between two points using the formulae. How do we further find the Support vector classifier?

@statquest 2 года назад

The SVC is found using an iterative process that is a lot like Gradient Descent, and the output from the kernels is like the "loss" values.

@leonugraha 4 года назад

Thank you for SVM follow up video, by the way, do you maintain a Github account?

@statquest 4 года назад

I should...

@chinzzz388 4 года назад

When we calculate relationships between 2 data points, do we calculate relationships between all the points w.r.t all the other points? Ex: if we have 4 data points (1,2,3,4) do we calculate relationship between (1,2) and (3,4) OR do we calculate relationship between (1,2),(1,3),(1,4),(2,3)...etc

@statquest 4 года назад

We calculate all of the relationships.

@hassanjb83 4 года назад

At 6:33 you mention that we need to determine the value of both r and d through cross validation. If we have one dimensional data then shouldn't be d = 2 only?

@statquest 4 года назад

Why do you say that?

@hemersontacon3168 4 года назад

I think you got too attached to the example. Imagine the same example but with the two colors all mixed up. Then I think that d = 2 would not be enough to split things up!

@ccuny1 4 года назад

@@hemersontacon3168 That's an insightful comment that actually opened my eyes. Thank you.

@hemersontacon3168 4 года назад

@@ccuny1 Glad to know and glad to help ^^

@61_shivangbhardwaj46 3 года назад

Thnx sir great explanation :-)

@statquest 3 года назад

Thank you! :)

@muhammadiqbalbazmi9275 4 года назад

Sir, will you please give us a link to your presentation that you use in these videos.

@ayoubmarah4063 4 года назад

Great content as usual BIG THANKS to you I hope you are having a nice day i have questions if you dont mind : i got confused with the problem of the imbalanced classes , when the classes are imablanced we do either upsampling or downsampling so that we have a balanced data 1) does the accuracy score always wrong using imbalenced data? what about f1_score then ? 2) how to decide which sampling method is good ? should we run them both ? i do my best to try and search for solution but there is so much opinion and im lost , i saw your video last week but when i got my hand dirty with projects i confront new problems that are complicated Thank you again for your help

@statquest 4 года назад

I'm glad you like the video. For details about how to use SVMs with unbalanced data, see this discussion: stats.stackexchange.com/questions/94295/svm-for-unbalanced-data

@wong4359 Год назад

I wish if there are 10 like bottoms, so that I can click all of it ! I will make sound of bibibubibu when I am clicking the like.

@statquest Год назад

BAM! :)

@raktimnaskar2333 8 месяцев назад

Can anyone explain to me how the dot products of the feature vectors can find the separating hyperplane?

@statquest 8 месяцев назад

First, think of a dot product as a type of measure of similarity (the larger the absolute value, the more similar) and that similarity can be a proxy for closeness. Then those measures are plugged into an iterative algorithm, somewhat like gradient descent (see: ru-vid.com/video/%D0%B2%D0%B8%D0%B4%D0%B5%D0%BE-sDv4f4s2SB8.html ), to find the optimal classifier.

@sabbirakhand7120 4 года назад

After getting the value of r and d by cross validation we get the value of 16002.25. But how to use this value to determine the high dimensional relationship?? This video was really helpful to understand the topic despite of me being from a different background. Thanks.

@statquest 4 года назад

The actual method for finding the classifier would require a whole video on it's own. It's like gradient descent, but with a few important differences.

@jaideepkukkadapu2600 3 года назад

@@statquest Please make a video on that ,it will be very helpful for us.

@slirpslirp 4 года назад

awesome, so the dot product is equal to the result of the kernel function ?

@statquest 4 года назад

yep!

@hrdyam865 4 года назад

Thanks for the videos 😊, Can we use SVM for multinomial classification?

@statquest 4 года назад

I believe you just create one SVM per classification, and each SVM compares one classification to all the others (i.e. a sample either has that classification or not).

@abhishekanand5974 3 года назад

What exactly is meant by relationships between observations?

@statquest 3 года назад

It's some metric of distance.

@Beenum1515 4 года назад

What I understood the function of kernel is to transform the data into high dimension so that there exists a classifier in that dimension which seperates those points. Right? If yes than why not just square each value instead of getting each pair to kernel function?

@statquest 4 года назад

The kernel provides us with the high-dimensional relationships between points without actually doing the transformation. This is the "kernel trick" and it saves time and makes it possible to determine the relationships between points in infinite-dimensions (which is what the radial kernel does).

@zeynabmousavi1736 4 года назад

How overfitting is evaluated in SVM? How do you check whether the output of SVM is generalizable or not?

@statquest 4 года назад

You compare the classifications made with the training dataset to classifications made with the testing dataset.

@zeynabmousavi1736 4 года назад

@@statquest Thank you. I should have mentioned that I have small data set and I take all datapoints as training set and do 10 fold cross validation. I am concerned about having ovefitting.

@iliasp4275 2 года назад

send my love to fred

@statquest 2 года назад

Bam! :)

@rajatsankhla9261 2 года назад

Hii Josh could you help me understand how one should choose the value of r in the kernal function.

@statquest 2 года назад

In theory, cross validation would work. This is not something I've done before but my guess is that it might not matter much.

@The_Mashrur 2 года назад

When you say relationships between observations, what exactly do you mean? You didn't really go over how such relationships allow you to find an SVC in the higher dimension?

@statquest 2 года назад

In the case of SVM, the relationship is a rather abstract metric of distance.

@lonandon 3 года назад

What does the result of the dot product mean when it represents the relationship of two dots?

@statquest 3 года назад

It's the input to an iterative algorithm, much like gradient descent, that can find the optimal classifier.

@hamidomar3618 2 года назад

Hey, great video, thanks! What happens after the transformation though? I mean, how does the final result. i.e. a scalar corresponding to relationship between each observation, help in identifying an optimally classifying hyperplane?

@statquest 2 года назад

The value is used in a way similar to how loss values are used in Gradient Descent. There is an iterative algorithm that uses the values to optimize the fit.

@UjjwalGarg09 4 года назад

please make a video on this 5:58

@statquest 4 года назад

It's on the to-do list.

@utkarshagrawal4708 2 года назад

Any resources for understanding why the dot product?

@statquest 2 года назад

I'm not sure I fully understand your question - but I'm guessing you are asking how the dot product leads to the optimized support vector classifier. Think of it as the loss function that we use for gradient descent.