K-Fold Cross Validation - Intro to Machine Learning

Подписаться 605 тыс.

Просмотров 446 тыс.

50% 1

This video is part of an online course, Intro to Machine Learning. Check out the course here: www.udacity.com/course/ud120. This course was designed as part of a program to help you and others become a Data Analyst.
You can check out the full details of the program here: www.udacity.com/course/nd002.

Опубликовано:

22 фев 2015

Ссылка:

Скачать:

Готовим ссылку...

Добавить в:

Мой плейлист

Посмотреть позже

Комментарии : 110

@BrettBissey 5 лет назад

can't stop looking at the blister on his hand

@javiersuarez8415 5 лет назад

🤭

@advaitgogte6385 4 года назад

I wondered what would happen if he used a hand sanitizer lol

@psijkopupa6853 3 года назад

@@advaitgogte6385he is a drummer

@bjarnij3782 3 года назад

@@psijkopupa6853 or a yard worker lol

@grumpyae86 3 года назад

i swear to god i was just about to say that lol

@garybutler1672 2 года назад

1. Train/Validation 2. Either 3. K-Fold CV I've seen a lot of answers I disagree with in the comments, so I'll explain. First, the terminology is Train/Validation when used to train the model. The Test set should be taken out prior to doing the Train/Validation split and remain separate throughout training. The Test set will then be used to test the trained model. Second, the answers. 1. Obviously training will take longer doing it 10 times. 2. While training did take longer, you are actually running the same size model in production. All other things being equal the run times of both already trained models should also be equal. 3. The improved accuracy is why you would want to use K-Fold CV. If I'm wrong, please explain. I'll probably never see your comments, but you could help someone else.

@lordblanck7923 2 года назад

i don't think test set is separated prior to training. Its like there is 10 equal size but random value groups. Lets say you choose group 1 and then every other group is combined to get training set while group 1 is test set and then you get a value and again you repeat this time with group 2, group 2 is test set while other group combines to get a training set and there you go, we repeat this and get our value

@ahmedmustahid4936 Год назад

3. The improved accuracy is why you would want to use K-Fold CV. I dont think accuracy does not necessarily get improved if K fold cv is used. K fold cv is used to reduce variation in the values of metrics accross different train/test sets.

@jingyiwang5113 5 месяцев назад

Thank you for this explanation about k-fold cross validation! It is really helpful!😃

@Jsheng007 6 лет назад

Interesting to see that your video presented like this, mind to share how do you present your drawing like this?

@9MeiRoleplay 6 лет назад

this video is really usefull, thank you very much. it help me a lot.

@junbozhao9675 4 года назад

Yeah. Very Clear.

@sumitdam9642 5 лет назад

Can anybody provide me the video link which describes the training and test sets by Mrs. Katie ?

@shwetaredkar734 4 года назад

Here in K fold CV, A model in each fold computes an average result. So entire 10 fold CV is an average of average? What does it mean by 5 times 10 fold cv? How it is different from the normal 10 fold CV? Can someone help me understand this?

@user-kk4lh4zp1u 5 лет назад

thank you very much , that video is helpful ..

@dorsolomon7251 7 лет назад

It's obvious that the result are : train/test, train/test, and then cross validation. cross validation run the program "k" times so it's "k" time slower , but one the other hand is more accuracy.

@tamvominh3272 4 года назад

So which model that I should take to do a demo with a data point? because if k=10 I will have 10 models. Using these 10 models and voting for the last label? Is it right? Thank you so much.

@ericklestrange6255 4 года назад

it is but sometimes you come here from reading tons of complicated shit around and you just want things to be chewed for you, thats why we come to videos. thanks for the results

@whalingwithishmael7751 4 года назад

Simple and beautiful

@randa7892 6 лет назад

do all the 10 folds have to be of the same size? what is the effect if they are of different sizes?

@ivoriankoua3916 6 лет назад

the final part will have fewer instances than the other k − 1 parts . Supposed that with the same example you have 207 data set size . You'll use 207/10 = 20 (as an integer operation) and also 207%10=7 so finally , you'll get 11k model with 10 which have 10 as value and the last one 7

@ijyoyo Год назад

Interesting video. Thanks for sharing.

@ryanmccauley211 6 лет назад

Great explanation thanks!

@shahi_gautam 5 лет назад

I have a small dataset of 48 samples if I have applied MLP using 6-fold, Do I still need validation set to avoid the biased result on the small dataset? Please suggest.

@ericklestrange6255 4 года назад

from my book it says that smaller sets require bigger k (number of folds), and the oposite because of computational cost. however it also seems counter intuitive to me, since having an already super small dataset and dividing it by a big number youll end up with practically individual samples so you cant correlate... (?)

@yogeshwarshendye4857 3 года назад

won't this make the model specialized for the data that we have??

@snk2288 7 лет назад

The test bin is different every time, so how do you average the results? Can you please provide a detailed explanation on this?

@ericklestrange6255 4 года назад

that is because you arent using a random seed on your classification algorithm. random_state=1

@sanika6916 8 месяцев назад

Thankyou so much very informative

@theawesomeNZ 5 лет назад

But this doesn't solve the issue of choosing the bin size, i.e. trade-off between training set and test set (although you are now using all the data for both tasks at some point).

@quubands4018 Год назад

When performing cross-validation, the value of K refers to the number of folds that the data is divided into. The choice of K depends on the size of the dataset and the desired level of precision in the performance estimate. If the dataset is small, a larger value of K can be used to ensure that the model is trained and tested on as many data points as possible. However, if the dataset is large, a smaller value of K can be used to reduce the computational complexity of the cross-validation process. A commonly used value of K is 10, which means that the data is divided into 10 equal parts, with each part used as a test set once and the remaining parts used as a training set. However, other values of K can be used depending on the specific dataset and the goals of the analysis. It is important to note that the choice of K can affect the estimated performance of the model, with higher values of K leading to a lower bias but higher variance in the estimate. Therefore, it is often recommended to perform multiple rounds of cross-validation with different values of K to obtain a more robust estimate of model performance.

@Tyokok 5 лет назад

Thanks for the video! Quick(silly) Question: in any of those validation methods, every time you change training data, are you going to re-fit the model? If so, every time validating step is respect to different model fit. Then how you determine your final model decision?

@knowlen 4 года назад

You re-train on 100% of the data. -future viewers fyi

@fishertech Год назад

@@knowlen So if you average the 10 separate performance metrics, does that mean in the futurre for a prediction task you must also use each of the k separate models and then take the average prediction?

@knowlen Год назад

@@fishertech not necessarily. In practice, ensembles do tend to perform better, but scale poorly (in compute and memory w.r.t. the perf gains). Cross validation reveals good hyper parameters. Once we have the values we just train a single model on the full data and expect similar performance to the averaged ensemble from the cross validation phase.

@rishabmacherla3326 10 месяцев назад

@@knowlen We won't be re-training on the 100% data every time, instead, we train only the k-1 blocks of the data. Let's say, we have 20 records in a dataset, we split into 4 parts. In the first iteration K1 will be test and K2,K3,K4 will be used to train. In next iteration, K2 will be test and K1,K3,K4 will be used to train and so on. So doing this, we can observe that, we are training the model will all the possible data values in the dataset. There will be no use in training the 100% dataset everytime.

@knowlen 10 месяцев назад

@@rishabmacherla3326 I meant that once we've cross validated all our models, we select the top performing one and train it on 100% of the data. If you already know the best model, you don't need to re-run cross validation for incremental changes to data pulled from the same target distribution --that's just a waste of compute.

@DoughyBoy 4 года назад

Why is it so hard to find a simple, concrete, and by hand example of simple k cross validation? All the documentation I can find is very generalized information, but no practical examples anywhere.

@kias87 2 года назад

The voice sounds like Sebastian Thrun. Great guy :)

@rlalduhsaka6746 5 лет назад

so, what is the difference with test_train_ split with test size=0.1

@charismaticaazim 5 лет назад

10% of the data is used for testing & 90% is used for training.

@Sthern34 5 лет назад

Thanks, clear

@mohashobak7454 4 года назад

So is this supervised, unsupervised or semi-supervised algorithm?

@TTBOn00bKiLleR 3 года назад

it's not about the algorithm u train and infere, it's about what data you choose to train any of them and test any of them, so that the model produces most accurate results

@oliveryoung6501 9 лет назад

what do you mean by data points, you mean instances ?

@chirathabey7729 8 лет назад

+olie tim Yes, Problem Instances, Data Tuples, Data Points, Records are all same.

@apericube27 6 лет назад

k-fold cross validation runs k learning experiences, so at the end you get k different models.... Which one do you chose ?

@tahaait7236 6 лет назад

You take the average of testing accuracy ,he said that at the end

@apericube27 6 лет назад

I am not talking about accuracy but models, you can't always "average" models. I guess there are 2 options: 1-The cross-validation builds k models, then you get only an estimation of the accuracy and you will have to build a model on the whole train set afterward to have your final model. 2- The cross-validation builds a unique model with the whole train set and then estimates the accuracy on the k subsets

@paolofazzini3146 6 лет назад

I had the very same doubt and I find strange I was unable to find a quick answer even browsing several sources. However I think that 1), between your hypotheses, makes the most sense: the cross validation is meant to find out whether the MODEL (and not its parameters) is the right fit for the sample data; for instance it should find out if, say, a polynomial has the right number of parameters(=the right degree) and does not cause overfitting. Once you know, by this method, how good your model is you might use the whole set to train the found model and get the best parameters, as you say.

@FernandoWittmann 6 лет назад

Paolo is correct. We actually use cross-validation for evaluating the hyperparameters rather than getting a final performance estimation on unseen data. 20% of the dataset should still be reserved for testing instead of being an `alternative` way for testing. Then, cross-validation is applied to the training set which is split into training and validation subsets in a cross-valdated way. More details here: scikit-learn.org/stable/modules/cross_validation.html

@regivm123 6 лет назад

Thanks for the question Guillaume... I too was struggling with this. Thanks Fernando and Paolo for responses.

@thesiberian9971 6 лет назад

What I don’t get is: say you’ve picked the 1st bin as your test set for the first run and the rest as your training set. Hasn’t the model learned everything in the training set for the rest of the runs? What’s the point of using all the k’s when they’ve already been used before?

@ferkstkojtt 6 лет назад

So basically you build k different models. Afterwards you validate the models on their average error to see how they differ. In the last step you are supposed to create the best models out of these k-models but I don't fully understand if you either just pick one model or combine them into one super model..

@BigBadBurrow 6 лет назад

You have k completely separate experiments, with a new/untrained network each time. But in each case have different train / test data. You then create an avg across all experiments and that is your error rate. It's just a more robust way of testing the network.

@asadmohammed706 6 лет назад

Just like it was told at the beginning of the video, If we only split the data into 2 parts (i.e. the train and test datasets) we might not extract information to the maximum extent. If we split the data into k parts, and then perform the cross-validation on different datasets we gain higher accuracy. If we do the 10 fold CV we get 10 results, those are your 10 different training and test accuracy and we choose the best one, so we are able to find the best subsets and combination.

@xordux7 5 лет назад

You learn in first run and then unlearn it. Then you choose another bin, learn from it in second run and then unlearn it again...this cycle goes on until k cycles are completed.

@omidasadi2264 5 лет назад

Hi bro, thanks for sharing this learning..... ??just a question?? with which application do you make this tutorial? it's amazing... your text came on and above your hand.

@conexionesmentales5444 Год назад

by any chance did you get the response to which application was used?

@omidasadi2264 Год назад

@@conexionesmentales5444 no, but I'm glad to hear about it

@BorisDessimond 6 лет назад

Nice !

@AhmedGamal-xi3vj 3 года назад

Can anyone share the answers for those questions please

@kostas_x 5 месяцев назад

ru-vid.com/video/%D0%B2%D0%B8%D0%B4%D0%B5%D0%BE-ADNFKiAjmWA.html

@TopThreeProducts 6 лет назад

1:15 - 1:18 Someone please tell me what he said he was speaking english then he just jumbled his words up

@mr.s7767 6 лет назад

check the subtitles!

@zamstahh_ 3 года назад

whereas in the work that katie showed you hes refering to the video another person did

@reedsutton8039 4 года назад

This is incorrect. You should correct this video, as you're encouraging people to mix their train and test sets, which is a cardinal sin of machine learning. Every time you say test set, you should be saying validation set. Test set can only be tested one time, and cannot be used to inform hyperparameters.

@Sergiogccm 3 года назад

Like because the blister made with a barbell.

@lionheart5078 4 года назад

do a simple practical example by hand, not just theory always. People understand better when there are actual numbers and you go through the entire procedure, even if its a trivial example.

@patrickdoherty9116 2 года назад

That huge popped blister on his hand is lowkey distracting

@weizeyin6772 5 лет назад

hey guys from ECON704

@bodilelbrink 8 месяцев назад

nobody else distracted by the wound?

@ytber8699 5 лет назад

i know its very old video but still its not necessary to show your hand while writing

@wint7627 5 лет назад

Well, it helps us see what he's pointing at.

@ranit_ 5 лет назад

Focus on content and you will not notice the hand any more. :)

@c0t556 5 лет назад

Why does his hand bother you???

@fellipealcantara6856 3 года назад

can't watch it... the blister is too anoying

@StEvUgnIn 2 года назад

You miss a part of your skin Sir

@julianfbeck 4 года назад

ihhhhh

@EllieOK 3 года назад

Kann man dir helfen?

@moathbudget7935 3 года назад

your pen is a disaster

@lovemormus 5 лет назад

the hand is so annoying

@killvampires 7 лет назад

I think the answers are train/test, train/test, and then 10-fold C.V. Also, don't make a video with some nasty open sore on your hand please. Wear a glove or something.

@fuu812 7 лет назад

Please don't be rude. Also, don't comment if your surname sounds like chew. Use an alias or something.

@phum126 7 лет назад

LOL you're complaining about a sore on his hand...try and be more of an uptight bitch bahahah smh

@NobilisVir 7 лет назад

The fuck is your problem, you're getting a free instruction on a valuable subject, jeez, the level of entitlement of some people.

@TEAdog77 6 лет назад

Shouldn't a model based on a simple train/test split have the same run time on new data compared to a model based on a cross validation approach?

@ranit_ 5 лет назад

@alex chow I'm down-voting this remark. Stop spreading hatred & focus on the content please. His knowledge is much more deeper than the 'open sore'. Happy learning!