After watching so many different ML tutorial videos and literally so many i have just one thing to say, the way you teach is literally the best among all of them. You name any famous one like Andrew NG or sentdex but you literally need to have prerequisites to understand their videos while yours are a treat to the viewers explained from so basics and slowly going up and up. And those exercises are like cherry on the top. Never change your teaching style sir yours is the best one.👍🏻
He did folds = StratifiedKFold(), and said that he will use it because it is better than KFold but at 14:20, he used kf.split, where kf is KFold. I think he frogot to use StatifiedKFold.
I have never seen anyone who can explain Machine Learning and Data Science so easily.. I used to be scared in Machine Learning and Data science, then after seeing your videos, I am now confident that I can do it by myself. Thank you so much for all these videos.... 👏👏👏👏👏👏👏👏👏👏👏👏👏👏👏👏👏👏👏👏👏👏👏👏👏👏👏👏
that approach of doing the manual method of what cross_val_score is doing in the background and then introducing the method! God send! Brilliant. Brilliant I say!
Your videos are AMAZING man!!! I have already recommended these videos to my colleagues in my University who is taking Machine Learning course. They are also loving it...!!! Keep it up champ!
Thank you Sir for this awesome explanation. Iris Dataset Assignment Score Logistic Regression [96.07% , 92.15% , 95.83%] SVM [100% , 96.07% , 97.91%] (Kernel='linear') Decision Tree [98.03 %, 92.15% , 100%] Random Forest [98.03% , 92.15% , 97.91%] Conclusion: SVM works the best model for me .
we needed to use mean() with cross validation to get average mean of accuracy score. i'm guessing you forget to add. anyways video is pretty good and depth.keep producing such videos.
My teacher is frustratingly bad. I am learning from your videos so that I can get a good grade in my class. Thank you for taking some time to demonstrate what is happening. When you showed me with the example at 10:47, I finally understood.
Thank you very much. Very nice explanation. My scores, after taking averages, are as follow: LogisticRegression (max_iter=200) = 97.33% SVC (kernel = poly) = 98.00% DecisionTreeClassifier = 96% RandomForestClassifier (n_estimators=300) = 96.67%
@@manu-prakash-choudhary After 50 splits 😎😎 Score of Logistic Regression is 0.961111111111111 Score of SVM is 0.9888888888888888 Score of RandomForestClassifier is 0.973111111111111
@20:39 of the video, noticed something interesting, by default "cross_val_score()" method generates 3 kfolds... but the default has now changed from 3 to 5 :))
For the parameter tuning this helps. Just play a bit with indexes due to lists staring from 0 and n_estimators from 1 to match up indexes. scores=[ ] avg_scores=[ ] n_est=range(1,5) #example for i in n_est : model=RandomForestClassifier(n_estimators=i) score=cross_val_score(model,digits.data, digits.target, cv=10) scores.append(score) avg_scores.append(np.average(score)) print('avg score:{}, n_estimator:{}'.format(avg_scores[i-1],i)) avg_scores=np.asarray(avg_scores) #convert the list to array print(' Average accuracy score is {} for n_estimators={} calculated from following accuracy scores: {}'.format(np.amax(avg_scores),np.argmax(avg_scores)+1,scores[np.argmax(avg_scores)])) plt.plot(n_est,avg_scores) plt.xlabel('number of estimators') plt.ylabel('average accuracy') 44 was the best for me
AWESOME AWESOME..... Excellent video you have created. I'm learning ML since past more than 1 years and heard almost more 400 videos. Your videos are AWESOME.... Please make complete series on ML... Thanks.
Your videos are really good! The explanation is crisp and succinct! Love your videos! Keep posting! By the way, you may not realize it, but you are changing peoples' lives by educating them! Jai Hind!
I used n_folds =5 in my code.When I used logistic regression, I got the score "1" for two times and in case of SVC, when I tuned my parameter C to 5, I got "1" for three times in my cross_val_score(). Remaining methods just got only one time as score "1" .
Great video, as usual. Quick question: How were able to get such low scores for svm? I ran it a couple of times and was getting in the upper 90's. So, I set up a for loop, ran 1000 different train_test_split iterations through svm and recorded the lowest score. It came back 97.2%!
Dear Sir Another great explanation as always. Thank you very much for that. By adding the following code svm started showing very good scores! X_train = preprocessing.scale(X_train) X_test = preprocessing.scale(X_test) Have I done the correct thing?
Explanation was amazing sir and performed cross_val_score, below is the final average result(considered 10 folds) Logistic Regression - 95% SVM - 98% -------[Performed better] Decision Tree - 95% Random Forest - 96%
Hi there! Excellent video! This greatly explains the concepts and is very helpful! Keep up the awesome work! I have 2x questions please - Please clarify: 1) Since the cross_val_score method is used to get the score for the performance of a machine learning model, when using Stratified K Fold Cross Validation, is it the only performance measure? Can we also use the following, and if yes, how? Please explain with an example please: - Accuracy - Precision - Recall - Specificity - F1_Score - ROC_Curve - Model Execution Time (How is this possible using Jupyter Notebooks?) 2) Expanding on the content of this RU-vid video, please explain with an example, on how to retrieve the Feature Importance of a machine learning model please? At which stage would this be done? At the end, right? I mean, after we get the average score of a machine learning model using Stratified K Fold Cross Validation? Thanks a lot in advance. Much appreciated.
Hi Sir, Your explanation is very well. I need a small clarification - You created an object for StratifiedKFold as folds and not used it in that example, that's fine, i will do it by myself. But let me know how the cross_val_score has got split size as 3? was it just because we assigned it earlier?
Nope, cross_val_score get 3 folds by default, you can check it at documentation. If you want to increase the numbers of folds, just pass the parameter: cross_val_score(model, X, y, cv=n_folds_you_want)
usage of same datasets make less uninteresting, but your tutorials are awesome every tutorial across every thing have + and -,your tutorials are more structured but minus point is usage of same dataset which reduces interest to go next next
Using the K Fold Method, the data was split multiple times into X_train s and y_train s but remained constant for each method for each split. Is it the same case in the cross_val_score method? Isn't the splitting taking place differently for each method? So basically the models are trained on different X_train s and y_train s Thank you so much for the clear explanation.
I have the same question. I tried using folds object to split meaning, instead of kf.split(digits.data), I tried folds.split(digits.data) to compare both the results for all the models but it gave me an error. "split() missing 1 required positional argument: 'y' ". To rectify this, I gave digits.target and it worked!
Really good explanation. You are an expert. I have a question, Is it possible to select the test_size in cross-validation. Because when I use for example, Kfold with 3 splits. It splits the whole data into three parts, but it is possible to make these three splits but using 2 data tests and 7 data train.
When using the cross_val_score, what do we pass alongside the model (the train set, the test set or just X and y before splitting into train and test sets)? Plus I love your videos man, you've made some of the more confusing topics so understandable and clear, cheers.
Thank you very much for the nice explanation. I have one question in this context: Isn't it necessary to use in train_test_split method the 'random_state' to get the same score for any model?
Im a novice when it comes to Data Analytics and I find your videos super useful and enjoyable. I just have one question though. Is K fold validation only used for classification problems?
No-- the "k" in Kfold validation refers to the number of folds the training data is split into. For example, if training data is split into 5 folds, then k = 5. This is called 5-fold cross validation. Same goes for any number of folds (2, 3, 7, etc). Kfold cross validation can be used to train many different machine learning models. It's just a way to split up training data.
Thank you very much sir for this very nice explanation. My results are: Logistic Regression=95.33% SVM=97.33% Decision Tree=96.67% Random Forests(40 estimators)=96.67%
Kudos to you, this was the most the crystal clear explanation so fear I have seen. but one small query how to get train accuracy in cross_validation algorithm?
I don't understand this method on the kid that has to take the test on the very start. How would that apply to that particular real life example? Also not sure if shuffling KFold or getting StratifiedKFold is better?
by making df method: mean(cross_val_score(LogisticRegression(max_iter=200), X,y)) 0.9733 mean(cross_val_score(SVC(kernel='linear'),X,y)) 0.98 mean(cross_val_score(RandomForestClassifier(n_estimators=40), X, y)) 0.96 by using iris.data and iris.target directly: np.average(score_lr) 0.95333 np.average(score_svm) 0.98000001 np.average(score_rf) 0.95333333
Thanks for the video! I have a question, when you do the cross validation inside the for loop you use the same folds for all the methods. Does the cross_val_score do the same? If not, it is posible to use the same folds in order to get a more accurate comparison. Thanks in advance
00:02 K fold cross validation helps determine the best machine learning model for a given problem. 02:20 K-fold cross validation provides a more robust evaluation of machine learning models. 04:36 Classifying handwritten characters into ten categories using different algorithms and evaluating performance using k-fold cross validation. 07:06 K-fold cross validation helps in more robust model evaluation. 09:43 K-fold cross validation divides data into training and testing sets for iterative model evaluation. 12:35 Stratified k-fold ensures uniform distribution of categories for better model training. 15:42 Measuring the performance of models in each iteration 18:29 Parameter tuning in random forest classifier improves scores. 20:46 K Fold Cross Validation helps measure the performance of machine learning models. 23:18 Cross-validation helps in comparing algorithms and finding the best parameters for a given problem. 25:18 K Fold Cross Validation helps in assessing the model's performance. Crafted by Merlin AI.