Machine Learning Tutorial Python 12 - K Fold Cross Validation

codebasics

Подписаться 1,1 млн

Просмотров 375 тыс.

50% 1

Видео Поделиться Скачать Добавить в

Опубликовано:

26 сен 2024

Ссылка:

Скачать:

Готовим ссылку...

Добавить в:

Мой плейлист

Посмотреть позже

Комментарии : 597

@codebasics 2 года назад

Check out our premium machine learning course with 2 Industry projects: codebasics.io/courses/machine-learning-for-data-science-beginners-to-advanced

@MrSparshtiwari 3 года назад

After watching so many different ML tutorial videos and literally so many i have just one thing to say, the way you teach is literally the best among all of them. You name any famous one like Andrew NG or sentdex but you literally need to have prerequisites to understand their videos while yours are a treat to the viewers explained from so basics and slowly going up and up. And those exercises are like cherry on the top. Never change your teaching style sir yours is the best one.👍🏻

@beansgoya 5 лет назад

I love that you go through the example the hard way and introduce the cross validation after

@The_TusharMishra 8 месяцев назад

He did folds = StratifiedKFold(), and said that he will use it because it is better than KFold but at 14:20, he used kf.split, where kf is KFold. I think he frogot to use StatifiedKFold.

@r0cketRacoon 7 месяцев назад

yeah, i noticed that

@AltafAnsari-tf9nl 3 года назад

Couldn't ask for a better teacher to teach machine learning. Truly exceptional !!!!Thank You so much for all your efforts.

@beerusreal6 3 года назад

I have never seen anyone who can explain Machine Learning and Data Science so easily.. I used to be scared in Machine Learning and Data science, then after seeing your videos, I am now confident that I can do it by myself. Thank you so much for all these videos.... 👏👏👏👏👏👏👏👏👏👏👏👏👏👏👏👏👏👏👏👏👏👏👏👏👏👏👏👏

@codebasics 3 года назад

Happy to help

@tatendaVIDZ90 2 года назад

that approach of doing the manual method of what cross_val_score is doing in the background and then introducing the method! God send! Brilliant. Brilliant I say!

@mastijjiv 4 года назад

Your videos are AMAZING man!!! I have already recommended these videos to my colleagues in my University who is taking Machine Learning course. They are also loving it...!!! Keep it up champ!

@codebasics 4 года назад

Mast pelluri, I am glad you liked it and thanks for recommending it to your friends 🙏👍

@pablu_7 4 года назад

Thank you Sir for this awesome explanation. Iris Dataset Assignment Score Logistic Regression [96.07% , 92.15% , 95.83%] SVM [100% , 96.07% , 97.91%] (Kernel='linear') Decision Tree [98.03 %, 92.15% , 100%] Random Forest [98.03% , 92.15% , 97.91%] Conclusion: SVM works the best model for me .

@pranjaysingh4161 9 месяцев назад

pretty ironic and yet amusing at the same time

@codebasics 4 года назад

Exercise solution: github.com/codebasics/py/blob/master/ML/12_KFold_Cross_Validation/Exercise/exercise_kfold_validation.ipynb Complete machine learning tutorial playlist: ru-vid.com/video/%D0%B2%D0%B8%D0%B4%D0%B5%D0%BE-gmvvaobm7eQ.html

@hemenboro4313 4 года назад

we needed to use mean() with cross validation to get average mean of accuracy score. i'm guessing you forget to add. anyways video is pretty good and depth.keep producing such videos.

@imposternaruto 2 года назад

My teacher is frustratingly bad. I am learning from your videos so that I can get a good grade in my class. Thank you for taking some time to demonstrate what is happening. When you showed me with the example at 10:47, I finally understood.

@vishalrai2859 3 года назад

only one channel who has pure quality not beating around the bush thanks dhaval sir for your contribution

@codebasics 3 года назад

Thanks Vishal

@christiansinger2497 4 года назад

Thanks man! You're really helping me out finishing my university project in machine learning.

@codebasics 4 года назад

Christian I am glad to hear you are making a progress on your University project 😊 I wish you all the best 👍

@knbharath5947 5 лет назад

Great stuff indeed. I'm learning machine learning from scratch and this was very helpful. Keep up the good work, kudos!

@pappering 4 года назад

Thank you very much. Very nice explanation. My scores, after taking averages, are as follow: LogisticRegression (max_iter=200) = 97.33% SVC (kernel = poly) = 98.00% DecisionTreeClassifier = 96% RandomForestClassifier (n_estimators=300) = 96.67%

@autozoomcarrepairing6497 2 года назад

mine too...

@nicoleluo6692 Год назад

🌹 You are way way... way better than all of my Machine learning professor at school!

@parikshitshahane6799 3 года назад

Probably the best machine learning tutorials out there... Very good job Thanks!

@pablu_7 4 года назад

After Parameter Tuning Using Cross Validation = 10 and taking average Logistic Regression = 95.34% SVM = 97.34% Decision Tree = 95.34 % Random Forest Classifier = 96.67 % Performance = SVM > Random Forest > Logistic ~ Decision

@manu-prakash-choudhary 3 года назад

after taking cv=5 and C=6 svm is 98.67%

@sriram_cyber5696 Год назад

@@manu-prakash-choudhary After 50 splits 😎😎 Score of Logistic Regression is 0.961111111111111 Score of SVM is 0.9888888888888888 Score of RandomForestClassifier is 0.973111111111111

@naveenkalhan95 4 года назад

@20:39 of the video, noticed something interesting, by default "cross_val_score()" method generates 3 kfolds... but the default has now changed from 3 to 5 :))

@gandharvsaxena8841 3 года назад

thanks man, i was worried when mine was showing 5 folds results. i thought something was wrong w my code.

@khalidalghamdi6303 2 года назад

@@gandharvsaxena8841 Me too lol, whi I am getting 5

@aadilsstatus8895 2 года назад

Thankyou man!!

@anujvyas9493 4 года назад

14:15 - Here instead of kf.split() we should use folds.split(). Am I correct??

@codebasics 4 года назад

Yes. My notebook has a correction. Check that on GitHub link I have provided in video description

@Thedevineforce 4 года назад

Yes and also just to add to it StratifiedKFold requires X and y both labels to its split method. Stratification is done based on the y labels.

@layanibandaranayake9406 3 года назад

The best and the smilpest explanation for cross validation i could find after so mush searching.! Keep up the good work!

@panagiotisgoulas8539 2 года назад

For the parameter tuning this helps. Just play a bit with indexes due to lists staring from 0 and n_estimators from 1 to match up indexes. scores=[ ] avg_scores=[ ] n_est=range(1,5) #example for i in n_est : model=RandomForestClassifier(n_estimators=i) score=cross_val_score(model,digits.data, digits.target, cv=10) scores.append(score) avg_scores.append(np.average(score)) print('avg score:{}, n_estimator:{}'.format(avg_scores[i-1],i)) avg_scores=np.asarray(avg_scores) #convert the list to array print(' Average accuracy score is {} for n_estimators={} calculated from following accuracy scores: {}'.format(np.amax(avg_scores),np.argmax(avg_scores)+1,scores[np.argmax(avg_scores)])) plt.plot(n_est,avg_scores) plt.xlabel('number of estimators') plt.ylabel('average accuracy') 44 was the best for me

@ricardogomes9528 3 года назад

Finnaly a video explaining de X_train, X_test, y_train,y_teste. Thank you!

@synaestheticVI 4 года назад

What an excellent video, thank you! I got lost in other written tutorials, this was finally a clear explanation!

@codebasics 4 года назад

Hey, thanks for the comment. Keep learning. 👍

@zunairnoor2745 Год назад

Thanks sir! Your tutorials are really helpful for me. Hope I'm gonna see all of them and make my transition from mechanical to AI successful 😊.

@hpourmamih 4 года назад

This is one of the best explanation of Kfold Cross Validation!!! Thank you so much for sharing this valuable video . :))

@codebasics 4 года назад

😊👍

@kmchentw 3 года назад

Thank for the very useful and free tutorial series. Salute to you sir!

@KK-rh6cd 3 года назад

I watch several videos of CV but your video is well explained, thank you, thank you very much sir, keep uploading videos sir

@barackonduto5286 3 года назад

You are a great instructor and explain concepts in a very understandable and relatable manner. Thank you

@codebasics 3 года назад

I am happy this was helpful to you.

@AjayKumar-uy3tp 3 года назад

Sir You used KFold(kf) instead of StratifiedKFold(folds) in the video Will there be any difference in the scores if we use stratified KFold?

@Zencreate Год назад

There is slight difference in the scores

@beerusreal6 3 года назад

I am so close enough to finish your videos and then I'm going to hop into your Machine Learning and Data Science Projects... 😊😊😊😊😊😊😊😊😊😊😊

@codebasics 3 года назад

That is awesome!

@learnwithfunandenjoy3143 4 года назад

AWESOME AWESOME..... Excellent video you have created. I'm learning ML since past more than 1 years and heard almost more 400 videos. Your videos are AWESOME.... Please make complete series on ML... Thanks.

@codebasics 4 года назад

Pankaj I am happy it has helped you. :) And yes I am in process of uploading many new tutorials on ML. stay tuned!

@josephnnodim8244 3 года назад

This is the best video I have watched on Machine learning. Well done!

@codebasics 3 года назад

Glad you liked it!

@piyushbarthwal1722 Год назад

Don't have any words, you're teaching style and knowledge is amazing ✨...

@oscarmuhammad4072 3 года назад

This is an EXCELLLENT explanation. Straighfoward and simplified....Thank you.

@codebasics 3 года назад

Glad it was helpful!

@Suigeneris44 4 года назад

Your videos are really good! The explanation is crisp and succinct! Love your videos! Keep posting! By the way, you may not realize it, but you are changing peoples' lives by educating them! Jai Hind!

@anitoons999 24 дня назад

I used n_folds =5 in my code.When I used logistic regression, I got the score "1" for two times and in case of SVC, when I tuned my parameter C to 5, I got "1" for three times in my cross_val_score(). Remaining methods just got only one time as score "1" .

@alerttrade2356 4 года назад

Thank you. This video solved so many questions at once. Nicely done.

@jinks3669 2 года назад

Dhanyavaad Sir. Bhagwaan aapko swasth aur khush rakhien humesha. You are my god.

@programmingwithraahim 3 года назад

The best score in my case is of Logistic Regression 97.33% Excellent Machine Learning Tutorials.

@codebasics 3 года назад

Good job Raahim, that’s a pretty good score. Thanks for working on the exercise

@programmingwithraahim 3 года назад

@@codebasics Thanks sir

@carpingnyland8518 2 года назад

Great video, as usual. Quick question: How were able to get such low scores for svm? I ran it a couple of times and was getting in the upper 90's. So, I set up a for loop, ran 1000 different train_test_split iterations through svm and recorded the lowest score. It came back 97.2%!

@21_koustavbanerjee69 Год назад

In exercise the maximum score get by SVM at gamma=auto and kernel=linearr and the score is = array([1. , 1. , 0.98]) 😀

@Gamesational1 2 года назад

Useful for identifying many differnt types of categories.

@cindinishimoto9528 4 года назад

My results (with final average): L. Regression --> 97.33% Decision Tree --> 96.66% SVM --> 98.00% [THE WINNER] Random Forest --> 96.66%

@abhishekgaurav7786 4 года назад

right same here

@jaihind5092 4 года назад

Same but i tune svm with kernal = linear and got 99.33%

@cindinishimoto9528 4 года назад

@@jaihind5092 Pretty good, man!! 👏🏻

@rahuljaiswal9379 5 лет назад

very simple n lovely teaching......u r simple n great... thank u so much sir

@codebasics 5 лет назад

Thanks rahul for your kind words of appreciation

@Hiyori___ 3 года назад

your tutorial are saving my life

@anirbanc88 Год назад

thank you so much, i am so grateful for a teacher like you.

@nilupulperera 4 года назад

Dear Sir Another great explanation as always. Thank you very much for that. By adding the following code svm started showing very good scores! X_train = preprocessing.scale(X_train) X_test = preprocessing.scale(X_test) Have I done the correct thing?

@apeculiargentleman6925 5 лет назад

You make exquisite content, I'd love to see more!

@sidduhedaginal 4 года назад

Explanation was amazing sir and performed cross_val_score, below is the final average result(considered 10 folds) Logistic Regression - 95% SVM - 98% -------[Performed better] Decision Tree - 95% Random Forest - 96%

@codebasics 4 года назад

Good job. those scores are indeed excellent.

@zakiasalod891 5 лет назад

Hi there! Excellent video! This greatly explains the concepts and is very helpful! Keep up the awesome work! I have 2x questions please - Please clarify: 1) Since the cross_val_score method is used to get the score for the performance of a machine learning model, when using Stratified K Fold Cross Validation, is it the only performance measure? Can we also use the following, and if yes, how? Please explain with an example please: - Accuracy - Precision - Recall - Specificity - F1_Score - ROC_Curve - Model Execution Time (How is this possible using Jupyter Notebooks?) 2) Expanding on the content of this RU-vid video, please explain with an example, on how to retrieve the Feature Importance of a machine learning model please? At which stage would this be done? At the end, right? I mean, after we get the average score of a machine learning model using Stratified K Fold Cross Validation? Thanks a lot in advance. Much appreciated.

@JuanAntonioNelson Год назад

Great question.

@helloonica8515 2 года назад

This is the most helpful video regarding this topic. Thank you so much!

@simaykazc1508 3 года назад

Thanks for creating rather authentic content on this topic compare to others. It is more clear!

@codebasics 3 года назад

Glad it was helpful!

@hridayborah9750 4 года назад

nice n helpful. video with practice is more helpful than just lecture without practice session

@codebasics 4 года назад

😊👍

@yoyomovieclips8813 4 года назад

You solved one of my biggest confusion.....Thanks a lot sir

@rajnichauhan1286 4 года назад

what an amazing explanation. Finally! I understood cross validation concept so clearly. Thank You so much.

@codebasics 4 года назад

Glad it was helpful!

@venkatamaheshvanguru2124 5 лет назад

Hi Sir, Your explanation is very well. I need a small clarification - You created an object for StratifiedKFold as folds and not used it in that example, that's fine, i will do it by myself. But let me know how the cross_val_score has got split size as 3? was it just because we assigned it earlier?

@thiagomunich 5 лет назад

Nope, cross_val_score get 3 folds by default, you can check it at documentation. If you want to increase the numbers of folds, just pass the parameter: cross_val_score(model, X, y, cv=n_folds_you_want)

@codebasics 5 лет назад

@@thiagomunich Thanks Thiago for helping Mahesh with his question

@avesharora 4 года назад

@@codebasics How does StratifiedKFold come into action in this case?

@late_nights 4 года назад

@@avesharora yeah he forgot to use Stratified Fold in this case.

@shubhamsd100 2 года назад

@@avesharora use cv = StratifiedKFold(n_splits = 4) as a hyperparameter in Cross_val_score

@sumayachoya2738 6 месяцев назад

thank you for this series. it is helping me a lot.

@mvcutube 3 года назад

Great. You made things look very easy & boosts the confidence. Thank you.

@codebasics 3 года назад

Happy to help!

@abdulazizalqallaf1704 4 года назад

Best Explanation I have ever seen. Outstanding job!

@codebasics 4 года назад

I am happy this was helpful to you

@Kishor_D7 Месяц назад

usage of same datasets make less uninteresting, but your tutorials are awesome every tutorial across every thing have + and -,your tutorials are more structured but minus point is usage of same dataset which reduces interest to go next next

@kanyadharani6844 3 года назад

Super clear explanation, I have been searching for this one, by seeing this video makes me perfect, tq.

@codebasics 3 года назад

Glad it was helpful!

@ramezhabib320 Год назад

Using the K Fold Method, the data was split multiple times into X_train s and y_train s but remained constant for each method for each split. Is it the same case in the cross_val_score method? Isn't the splitting taking place differently for each method? So basically the models are trained on different X_train s and y_train s Thank you so much for the clear explanation.

@MNCAMANI15 3 года назад

So simple. You're a good teacher

@codebasics 3 года назад

Glad you think so!

@parisapouya6716 2 года назад

In line [33] of your code, the "kf" should be replaced with "folds" since "folds" is the object from the StratifiedKFold() class :) Am I right?

@Zencreate Год назад

I have the same question. I tried using folds object to split meaning, instead of kf.split(digits.data), I tried folds.split(digits.data) to compare both the results for all the models but it gave me an error. "split() missing 1 required positional argument: 'y' ". To rectify this, I gave digits.target and it worked!

@hansvasquezm Год назад

Really good explanation. You are an expert. I have a question, Is it possible to select the test_size in cross-validation. Because when I use for example, Kfold with 3 splits. It splits the whole data into three parts, but it is possible to make these three splits but using 2 data tests and 7 data train.

@akshyakumarshrestha5551 5 лет назад

The way you teach is awesome! I request to make tutorials on Neural Network if you are in that field. Thankyou!

@codebasics 5 лет назад

Akshya I started making videos on neural net. Check my channel, posted first two already..once TF2.0 is stable I will add more.

@zigzag4273 4 года назад

When using the cross_val_score, what do we pass alongside the model (the train set, the test set or just X and y before splitting into train and test sets)? Plus I love your videos man, you've made some of the more confusing topics so understandable and clear, cheers.

@victorh1397 3 года назад

I have the same question. I thought we must pass just the train set

@zainnaveed267 2 года назад

just x y bcz cross val function does the split train job itself

@shamsiddinparpiev51 4 года назад

Greatly explained man. Thank you

@garggarg406 4 года назад

Sir u are doing an amazing job...i am becoming your fan now...👑

@codebasics 4 года назад

Thank you so much Ayushi 😀

@neel_in_germany 5 лет назад

Excellent explanation of cross-validation and parameter tuning...

@codebasics 5 лет назад

Thanks for feedback Subhronil.

@learnerlearner4090 4 года назад

Great tutorials with easy to understand examples. Thanks so much!

@someshkb 4 года назад

Thank you very much for the nice explanation. I have one question in this context: Isn't it necessary to use in train_test_split method the 'random_state' to get the same score for any model?

@noahfatah2795 2 года назад

Im a novice when it comes to Data Analytics and I find your videos super useful and enjoyable. I just have one question though. Is K fold validation only used for classification problems?

@reeddowns1129 Год назад

No-- the "k" in Kfold validation refers to the number of folds the training data is split into. For example, if training data is split into 5 folds, then k = 5. This is called 5-fold cross validation. Same goes for any number of folds (2, 3, 7, etc). Kfold cross validation can be used to train many different machine learning models. It's just a way to split up training data.

@jatinkumar4410 4 года назад

Thank you very much sir for this very nice explanation. My results are: Logistic Regression=95.33% SVM=97.33% Decision Tree=96.67% Random Forests(40 estimators)=96.67%

@shreehari2589 4 года назад

Avg score?

@60pluscrazy 2 года назад

Your explanations are awesome 👌

@codebasics 2 года назад

Glad you like them!

@navjotsingh-hl1jg 9 месяцев назад

love your teaching pattern sir

@cantanzim6215 4 года назад

it is amazing explanation , grate job ...

@ramandeepbains862 2 года назад

Sir, SVM performance is high as compared to other algo after changing parameter gamma='scale' for the given example of digits dataset

@himadrijoshi3745 4 года назад

following your tutorials is the best way to learn Machine learning techniques. Please upload a video explanation on KNN as well.

@codebasics 4 года назад

Sure I will

@michaelcarlson2058 3 года назад

thank you for this video. Excellent presentation of the material with clear explanations

@codebasics 3 года назад

Michael, I am happy you find it useful

@shashankkkk 3 года назад

for me, SVM's score is almost 99 everytime

@computingpanda1629 3 года назад

Hey bro how are you?

@computingpanda1629 3 года назад

good to see you.

@shashankkkk 3 года назад

@@computingpanda1629 bro aap bhi idhar😂🤣🤣 machine learning padhne aaye ho😂

@siddharth2954 2 года назад

Then maybe ur overfitting the data 😂😂

@soumya2192 Год назад

Same here😅

@nnennaumelloh8834 3 года назад

This is such an amazing, clear explanation. Thank you so much!

@codebasics 3 года назад

Glad it was helpful!

@tech-n-data 2 года назад

Thank you sooooo much. You simplified that beautifully.

@shehzadahmad9419 2 года назад

is this K Folder Cross Validation is an optimization technique or not?

@mohankrishna2188 3 года назад

Kudos to you, this was the most the crystal clear explanation so fear I have seen. but one small query how to get train accuracy in cross_validation algorithm?

@hamzanaeem4838 4 года назад

The accuracy which we are getting is of trained part or test part ? . If it is of test then how can we check train accuracy ?

@panagiotisgoulas8539 2 года назад

I don't understand this method on the kid that has to take the test on the very start. How would that apply to that particular real life example? Also not sure if shuffling KFold or getting StratifiedKFold is better?

@adnanax 4 года назад

by making df method: mean(cross_val_score(LogisticRegression(max_iter=200), X,y)) 0.9733 mean(cross_val_score(SVC(kernel='linear'),X,y)) 0.98 mean(cross_val_score(RandomForestClassifier(n_estimators=40), X, y)) 0.96 by using iris.data and iris.target directly: np.average(score_lr) 0.95333 np.average(score_svm) 0.98000001 np.average(score_rf) 0.95333333

@gezahagnnegash9740 2 года назад

Thanks for sharing and really it's helpful

@vardhanvishnu618 5 лет назад

Thank you very much for your class. Its very useful for the beginners.

@codebasics 5 лет назад

I am happy you liked it Vishnu :)

@DavidMcMinoway 3 года назад

wow!! Thank you!! I understand now. And thanks also for providing the code.

@codebasics 3 года назад

I am happy this was helpful to you.

@deepanshudutta8414 4 года назад

Sir, really a very good explanation... finally i understood it very well.....

@codebasics 4 года назад

Glad it helped!

@jeffallenmbaagborebob5869 3 года назад

You are the best man ....Thanks very much.

@ashutoshsrivastava914 3 года назад

Good explanation..Gained some confidence to enhance my skills in this area..

@codebasics 3 года назад

All the best

@ignaciozamanillo9659 3 года назад

Thanks for the video! I have a question, when you do the cross validation inside the for loop you use the same folds for all the methods. Does the cross_val_score do the same? If not, it is posible to use the same folds in order to get a more accurate comparison. Thanks in advance

@RAKESHKUMAR-rb8dv 7 месяцев назад

00:02 K fold cross validation helps determine the best machine learning model for a given problem. 02:20 K-fold cross validation provides a more robust evaluation of machine learning models. 04:36 Classifying handwritten characters into ten categories using different algorithms and evaluating performance using k-fold cross validation. 07:06 K-fold cross validation helps in more robust model evaluation. 09:43 K-fold cross validation divides data into training and testing sets for iterative model evaluation. 12:35 Stratified k-fold ensures uniform distribution of categories for better model training. 15:42 Measuring the performance of models in each iteration 18:29 Parameter tuning in random forest classifier improves scores. 20:46 K Fold Cross Validation helps measure the performance of machine learning models. 23:18 Cross-validation helps in comparing algorithms and finding the best parameters for a given problem. 25:18 K Fold Cross Validation helps in assessing the model's performance. Crafted by Merlin AI.