Classification Trees in Python from Start to Finish

StatQuest with Josh Starmer

Подписаться 1,3 млн

Просмотров 188 тыс.

50% 1

Видео Поделиться Скачать Добавить в

Опубликовано:

29 сен 2024

Ссылка:

Скачать:

Готовим ссылку...

Добавить в:

Мой плейлист

Посмотреть позже

Комментарии : 589

@statquest 4 года назад

NOTE: You can support StatQuest by purchasing the Jupyter Notebook and Python code seen in this video here: statquest.gumroad.com/l/tzxoh Support StatQuest by buying my book The StatQuest Illustrated Guide to Machine Learning or a Study Guide or Merch!!! statquest.org/statquest-store/

@ezzouaouia.r1127 4 года назад

The site is offline. 11/07 12:00

@statquest 4 года назад

Thanks for the note. It's back up.

@ezzouaouia.r1127 4 года назад

@@statquest Thanks very much .

@dfinance2260 3 года назад

Still offline unfortunately. Would love to check the code.

@statquest 3 года назад

@@dfinance2260 It should be back up now.

@funnyclipsutd 4 года назад

BAM! My best decision this year was to follow your channel.

@statquest 4 года назад

BAM! :)

@mike19558 3 года назад

Mood, been so useful!

@vipanpatial2243 2 года назад

BAM!! You are best.

@statquest 2 года назад

Double bam!

@SarveshRelekar Год назад

Hi Josh! I had a question regarding why you would use One Hot Encoding instead of Label Encoding in this case. Wouldn't One Hot Encoding result in an increased number of dimensions and that would actually cause the Decision Tree algorithm to overfit ?

@statquest Год назад

One-hot encoding works well when you don't have too many different options (which is the case in this video). It's also the method of choice for more advanced tree based methods, like XGBoost.

@SarveshRelekar Год назад

@@statquest Thanks for the clarification! How many unique categories should a feature have that would prompt one to switch from One Hot to Label Encoding ?

@statquest Год назад

@@SarveshRelekar That is a great question! Unfortunately there is no hard and fast rule (except XGBoost, which says One Hot should be done regardless of the number of categories).

@sreejaysreedharan4085 4 года назад

Hi Josh , very nice video as usual.🙏.may I request you for a future webinar on multi-variate time series forecasting using ML techniques...

@statquest 4 года назад

Great suggestion!

@sreejaysreedharan4085 4 года назад

Though iam aware of the classical techniques like ESM,ARIMA family, UCM, IDM etc...I still cannot figure out how to use GBM,neural based/LSTM etc for time series forecasting for univariate and multi-variate cases ( using endogenous cases, where sales forecasting depends on revenue, profit, campaigns etc..)..I did go through few similar githubs but somehow cannot get the concepts right...

@sreejaysreedharan4085 4 года назад

also could you let me know how do I get an alert to join your live webinars..Do you have a separate community/platform for your live webinars?

@statquest 4 года назад

If you subscribe with "the bell" you should get announcements about webinars. If you become a channel member or a Patreon supporter ( www.patreon.com/statquest ), you'll get priority registration.

@sreejaysreedharan4085 4 года назад

@@statquest thank you...also Guruji (teacher in hindi) if i want to access the jupyter notebook of the decision trees which you taught us here,how do I do that?

@sakshitangri7600 4 года назад

The video gets blurred after 22 min....is it just for me?? want to understand one hot encoding but its blur:(

@statquest 4 года назад

I just looked and it was fine for me. It might just be your internet connection. Try again later? Hope that works for you.

@rahulthaker694 4 года назад

You look exactly how I thought you'd look like 😂

@statquest 4 года назад

BAM! :)

@rahulthaker694 4 года назад

@@statquest yesss legend 😂🙏

@statquest 4 года назад

@@rahulthaker694 :)

@chrissmith1152 4 года назад

he's done a lot of Live stream previously

@BeSharpInCSharp 4 года назад

I think he looks like joshua from Friends ( friend of rachel)

@joaomanoellins2219 4 года назад

I loved your Brazil polo shirt! Triple bam!!! Thank you for your videos. Regards from Brazil!

@statquest 4 года назад

Muito obrigado!!!

@cindinishimoto9528 4 года назад

@@statquest paying homage to Brazil!!

@statquest 4 года назад

@@cindinishimoto9528 Eu amo do Brasil!

@liranzaidman1610 4 года назад

Josh, this is really great. Can you upload videos with some insights on your personal research and which methods did you use? And some examples of why you prefer to use one method instead of the other? I mean, not only because you get a better result in RUC/AUC but is there a "biological" reasoning for using a specific method?

@statquest 4 года назад

Great suggestion!

@juniotomas8563 6 месяцев назад

Come on, Buddy! I've just saw a recommendation to your channel and on the first video I see you with a Brazilian t-shirt. Nice surprise!

@statquest 6 месяцев назад

Muito obrigado! :)

@breopardo6691 3 года назад

As Tina Turner would say: "You are simply the best!" 🎵🎵🎵

@statquest 3 года назад

BAM! :)

@renekokoschka707 3 года назад

I just started my bachelor thesis and i really wanted to thank you! Your videos are helping me so much. You are a LEGEND!!!!!

@statquest 3 года назад

Thank you and good luck! :)

@pratyushmisra2516 4 года назад

My intro song for this channel: " It's like Josh has got his hands on python right, He teaches Ml and AI really Well and tight ---- STAT QUEST" btw thanks Brother for so much wonderful content for free.....

@statquest 4 года назад

Thank you! :)

@kaimueric9390 4 года назад

I actually think it can be great if you created more videos for other ML algorithms. After teaching us almost every aspect of machine learning algorithms as far as the mechanics and the related fundamentals are concerned, I feel it is high time to see those in action, and Python is, of course, the best way to go.

@statquest 4 года назад

I'm working on them!!! :)

@jahanvi9429 Год назад

You are so so helpful!! I am a data science major and your videos saved my academics. Thank you!!

@statquest Год назад

Happy to help!

@SamirMishra6174 4 года назад

wow is that a tabla in the background ?

@statquest 4 года назад

Yes! I used to play tabla a long time ago.

@SamirMishra6174 4 года назад

@@statquest Amazing you are multi talented.

@creativeo91 3 года назад

This video helped me a lot for my Data Mining assignment.. Thank you..

@statquest 3 года назад

Glad it helped!

@SashaLamp Год назад

Thank you, this video helped me a lot! For anyone else following along in 2023, the way the confusion matrix is drawn here didn't work for me anymore. I replaced it with the following code: cm = confusion_matrix(y_test, clf_dt_pruned.predict(x_test), labels = clf_dt_pruned.classes_) disp = ConfusionMatrixDisplay(confusion_matrix = cm, display_labels=['Does not have HD', "Has HD"]) disp.plot() plt.show()

@statquest Год назад

BAM! Thank you. Also, I updated the jupyter notebook.

@rhn122 3 года назад

Great tutorial! One question, by looking at the features included in the final tree, does it mean that only those 4 features are considered for prediction, i.e., we don't need the rest so we could drop those columns for further usage?

@statquest 3 года назад

That is correct.

@montserratramirez4824 4 года назад

I love your content! Definitely my favorite channel this year Regards from Mexico!

@statquest 4 года назад

Wow, thanks! Muchas gracias! :)

@bressanini 2 года назад

Hey Josh, follow this equation: You + Brazilian Flag Polo Shirt + Awesome Content = TRIPPLE BAM!!!

@statquest 2 года назад

Muito bem! :)

@korcankomili7398 Год назад

I wish you were my uncle Josh or something. I could imagine how hard I would have had discussions with my parents to spend time with my TRIPLE cool uncle.

@statquest Год назад

bam! :)

@ArturMistiuk 20 дней назад

Where can I find imputing missing data webinar that you said? 18:30

@statquest 20 дней назад

Unfortunately I never got around to that webinar. The closest thing I have is a video on how to impute data with a random forest. However, this feature is only implemented in R (not python): ru-vid.com/video/%D0%B2%D0%B8%D0%B4%D0%B5%D0%BE-sQ870aTKqiM.html

@spidboy42 9 месяцев назад

Hello Josh, How do u manage to reply to every single comment on every video? Have you written some sort of code for this purpose?

@statquest 9 месяцев назад

Nope! I just go through them each morning and evening. There's only about 20 or 30 a day, so it's not too bad.

@Moiez101 Год назад

1 hour statquest? in the words of Barney Rubble's son: "BAM BAM!"

@statquest Год назад

double bam! :)

@kartikmalladi1918 3 часа назад

49:44 is each point on the plot, made of each alpha for different number of leaves? So it’s an average of all the different models possible at alpha say 0? We plotting avg squared residuals against alpha?

@sameepshah3835 3 месяца назад

I love you so much Josh. Thank you so much for everything.

@statquest 3 месяца назад

Thanks!

@ayatkhrisat5964 4 года назад

kindly add this video to the machine learning list

@statquest 4 года назад

Will do!

@bessa0 2 года назад

Kind Regards from Brazil. Loved your book!

@statquest 2 года назад

Thank you!

@ramendrachaudhary9784 4 года назад

We need to see you play some tabla to one of your songs. Double BAM!! Great content btw :)

@statquest 4 года назад

Maybe one day!

@ozzyfromspace 3 года назад

I dunno how I stumbled on your channel a few videos ago, but you've really got me interested in statistics. Nice Work sir 😃

@statquest 3 года назад

Hooray!

@TalesLimaFonseca 3 года назад

Man, you are awesome! Vai BRASIL!!!

@statquest 3 года назад

Muito obrigado!

@tsarnature6587 3 года назад

Damn you should start uploading guitar lessons.

@statquest 3 года назад

Bam! :)

@felipeaccioly8671 Год назад

A Brasil flag t-shirt, DOUBLE BAM!

@statquest Год назад

YES! :)

@randyluong6275 2 года назад

We have data scientist out there. We have "data artist" right in this video.

@statquest 2 года назад

Wow! Thank you!

@DanteNoguez 2 года назад

Double BAM! Haha, I love this guy

@statquest 2 года назад

Thank you!

@magtazeum4071 4 года назад

BAM...!!! I'm getting notifications from your channel again

@statquest 4 года назад

BAM! :)

@mahdimj6594 4 года назад

Neural Network Pleaseee, Bayesian and LARS as well. And Thank you. You actually make things much easier to understand.

@statquest 4 года назад

Thanks! :)

@naveenagrawal_nice 8 месяцев назад

Love this channel, Thank you Josh

@statquest 8 месяцев назад

Glad you enjoy it!

@bayesian7404 6 месяцев назад

You are fantastic! I'm hooked on your videos. Thank you for all your work.

@statquest 6 месяцев назад

Glad you like them!

@xiolee7597 4 года назад

Really enjoy all the videos! Can you do a series about mixed models as well, random effects, choosing models, interpretation etc. ?

@statquest 4 года назад

It's on the to-do list.

@Mohamm-ed 3 года назад

This voice remembering me when I listening to radio in UK. Love that. I want to go again

@statquest 3 года назад

@abdelrhmansayed5436 3 года назад

thank you for your great effort and simple explanation, i have only one question that is why did you split the data into X_train and y_trrain and then give it to cross_val_score , shouldn't coss validtion works on all X ?

@statquest 3 года назад

In theory we are trying to save some data for a final validation of the model.

@rodpagli1246 3 месяца назад

Brazil!

@statquest 3 месяца назад

Sim! :)

@kepiheroi 22 дня назад

Brasil Reference!!! 🇧🇷🇧🇷🇧🇷

@statquest 22 дня назад

Obrigado!

@catalystamlan 3 года назад

Hurray, I tip my hat. Small BAM!

@statquest 3 года назад

bam!

@anishchhabra5313 2 года назад

This is legen..... wait for it ....dary!! 😎 This detailed coding explanation of Decision Tree is hard to find but Josh you are brilliant. Thank you for such a great video.

@statquest 2 года назад

Glad you liked it!

@raindrop0405070 4 года назад

First, Thank you. You explain complicated things in very easiest way with visulization. But,You should have a better microphone with it. I think I am going to keep wathcing your videos.

@statquest 4 года назад

Thanks for the tips!

@amalsakr1381 7 месяцев назад

Thank you for your powerful tutrial

@statquest 7 месяцев назад

Glad it was helpful!

@rajatjain7465 Год назад

wowowowwo the best course ever, even better than all those paid quests thank you @josh stramer for these materials

@statquest Год назад

Thank you! :)

@kaimueric9390 4 года назад

I liked before watching

@statquest 4 года назад

Hooray!!!

@estebannantes8567 4 года назад

Hi Josh. Loved this video. I have two questions: 1- Is there any way to save our final decision tree model to use it later in unseen data without having to train it all again? 2- Once you have decided on your final alpha: why not training your tree on a full-unsplit dataset. I know you will not be able to generate a confusion matrix, but wouldn't your final tree be better if it is trained with all the examples?

@statquest 4 года назад

Yes and yes. You can write the decision tree to a file if you don't want to keep it in memory (or want to back it up). See: scikit-learn.org/stable/modules/model_persistence.html

@amitsaxena6530 4 года назад

Hi Josh, Request you to make more such ML videos in python which covers all ML concepts holistically. I am sure this course will then become more popular then any of the available ML courses. Pls pls pls....

@statquest 4 года назад

I'll consider it.

@RAMAYATRI 3 года назад

Can see Tabla in the background.... Planning to use it in any future video ?

@statquest 3 года назад

One day!

@sharmakartikeya 3 года назад

Hurray! I saw your face for the first time! Nice to see one of those whom I have subscribed

@statquest 3 года назад

bam!

@amc9520 Год назад

Thanks for making my life easy.

@statquest Год назад

Any time!

@willw4096 Год назад

1:00:20 Use color to visualize the category and the Gini impurity

@teetanrobotics5363 4 года назад

Amazing man. I love your channel. Could you please reorder this video , SVMs and Xgboost in the correct order in the playlist ?

@statquest 4 года назад

Yes!

@JoRoCaRa Год назад

brooo... this is insane!! thanks so much! this is amazing saving me so many headaches

@statquest Год назад

Glad it helped!

@chaitanyasharma6270 3 года назад

i loved your video support vector machines in python from start to finish and this one too!!! can you make more on different algorithms?

@statquest 3 года назад

I will try!

@Mustistics 2 года назад

Hey Josh. One thing that bugs me about this tutorial: when you do binary classification, you need to take into account class imbalance. Accuracy is the worst metric for this. Was that neglected for a reason?

@statquest 2 года назад

No, ideally we would take class imbalance into account.

@Mustistics 2 года назад

@@statquest Thanks for the response. So theoretically, I can follow your tutorial exactly, but use recall wherever you use accuracy?

@statquest 2 года назад

@@Mustistics Sure!

@Mustistics 2 года назад

@@statquest You're the best (and I'm not just saying that)!

@Cricketpracticevideoarchive 4 года назад

Hey Josh, it’s so good to see you are doing this, I am preparing for some interviews, it will help a lot

@statquest 4 года назад

Good luck! :)

@douglasaraujo9763 4 года назад

Your videos are always very good. But today I’ll have to commend you on your fashion choice as well. Great-looking shirt! I hope you have had the opportunity to visit Brazil.

@statquest 4 года назад

Muito obrigado! Eu amo do Brasil! :)

@nataliatenoriomaia1635 3 года назад

Great video, Josh! Thanks for sharing it with us. And I have to say: the Brazilian shirt looks great on you! ;-)

@statquest 3 года назад

BAM! :)

@michelchaghoury870 2 года назад

MANNNN so usefull please keep going

@statquest 2 года назад

Thanks!

@ccuny1 4 года назад

I have already commented but I watched the video again and I have to say I am even more impressed than before. truly fantastic tutorial, not too verbose but with every action clarified and commented in the code, beautifully presented (I have to work on my markdown; there are quite a few markdown formats you use that I cannot replicate...to study when I get the notebook). So all in all, one of the very top ML tuts I have ever watched (including paid for training courses). Can't wait for today's or tomorrows webinars. Can't join in real time as based in Europe, but will definitely pick it up here and get the accompanying study guides/code.

@statquest 4 года назад

Hooray!!! Thank you very much!!!

@3ombieautopilot 4 года назад

Thank you very much for this one! You're channel is incredible! Hats off to you

@statquest 4 года назад

Bam! :)

@dhruvishah9077 3 года назад

I'm absolute beginner and this is what i was looking. Thank you so much for this. Much appreciated sir!!

@statquest 3 года назад

Glad it was helpful! :)

@julescesar4779 3 года назад

thank you so much sir for sharing

@statquest 3 года назад

Thanks!

@gbchrs 3 года назад

your channel is the best at explaining complex machine learning algorithm step by step. please make more videos

@statquest 3 года назад

Thank you very much!!! Hooray! :)

@lelandconn 3 года назад

Im facing the error: 'DecisionTreeClassifier' object has no attribute 'cost_complexity_pruning_path' Please help

@statquest 3 года назад

If you are using my code, please contact me through my website: statquest.org/contact/

@avramisthename 11 месяцев назад

great insight and refresher, thank you for documenting

@statquest 11 месяцев назад

Glad you enjoyed it!

@ksimonchung 8 дней назад

Hi Josh, can you help us understand why imputing values are best done after split train test? Thanks.

@statquest 8 дней назад

To avoid data leakage: ru-vid.com/video/%D0%B2%D0%B8%D0%B4%D0%B5%D0%BE-pSWRT9pObX0.html

@alpatul 2 года назад

This is great, do you have any more python webinars related to machine learning? I would love to go through them.

@statquest 2 года назад

I'm working on more right now.

@oliveryoule11 2 года назад

@@statquest At 19 minutes you say you have plans for a whole webinar on missing data! This is what I need. Where can I find it or is it still in production? :D

@statquest 2 года назад

@@oliveryoule11 Dang! I'd forgotten about that. I guess you could say it's still in production. :)

@oliveryoule11 2 года назад

@@statquest Thanks for replying! I can see how easy it is to forget! You have so much content its unreal! V impressive! I just purchased your Notebook through the link - but it doesn't appear to arrived in my inbox. Can you advise? I am also strongly considering paying for your Patreon account. I currently pay for Datacamp - but your material is so much better!

@statquest 2 года назад

@@oliveryoule11 Wow! Thanks for supporting me and I'm sorry you had trouble purchasing the notebook. If you contact me through my website, I can send it tor you directly: statquest.org/contact/

@paulovinicius5833 3 года назад

I know I'll love all the content, but I start liking the video immediatly bc of the music! haha

@statquest 3 года назад

Thank you! :)

@utkarshsingh2675 2 года назад

this is what I have been looking for on youtube...thanks alot sir!!

@statquest 2 года назад

Thanks!

@fernandosicos 2 года назад

greatings from Brazil!

@statquest 2 года назад

Muito obrigado! :)

@user-qo7bz5em3u 2 года назад

Great tutorial! But unfortunately, I´m struggling at min 48. How could it be, that I get a negative ccp_alpha of -2.168404344971009e-19? y values are 0 or 1 and all X values are positive? Have someone an idea what´s the reason for?

@statquest 2 года назад

Are you using my data and my code?

@fern092 3 года назад

To see a full picture of decision tree at 41:00 try this code: from sklearn import tree clf_dtree = tree.DecisionTreeClassifier(random_state=42) clf_dtree = clf_dtree.fit(X_train, y_train) plt.figure(figsize=(44, 20)) tree.plot_tree(clf_dtree, fontsize=10, filled = True, rounded= True, class_names = ["No HD", "Yes HD"], feature_names = X_encoded.columns) plt.show() Click on miniaturized display.

@statquest 3 года назад

Cool!

@krishanudebnath1959 2 года назад

love the tabla and ur content

@statquest 2 года назад

Thanks! My father used to teach at IIT-Madras so I spent a lot of time there when I was young.

@TheKukun123 3 года назад

when i want to plot the confusion matrix, the following error occurs at the import library stage : ImportError: cannot import name 'plot_confusion_matrix' from 'sklearn.metrics' (C:\Users\hp\Anaconda3\lib\site-packages\sklearn\metrics\__init__.py).. what do i do to rectify this?

@statquest 3 года назад

Presumably you need to install sklearn.metrics. you can do this with: "conda install scikit-learn"

@lucillewiid5476 7 месяцев назад

Hi, Josh, recommend your videos to all my students and love watching and learning from them 👍. Can we still download this notebook?? Or do we need to buy it?? Regards from South Africa!

@statquest 7 месяцев назад

This notebook has always been for sale and is still for sale if you would like it.

@soniavega9161 3 года назад

A doubt , when trying to calculate the best alpha ( scores = cross_val_score(clf_dt, X_train, y_train, cv=5)) the data used for the calculation it's train data, but I understood in the video How to Prune Regression Trees, Clearly Explained!!! that ALL the data were used to find the optimum alpha... , sorry probably it's clear but I can't find the answer

@statquest 3 года назад

Unfortunately I am sometimes sloppy when I describe training and testing data. Sometimes I call something "testing data" when it is "validation data". So, in this case, the testing data is "validation" and "training data" represents ALL of the data that we will use to create the tree.

@filosofiadetalhista 2 года назад

Loved it. I am working on Decision Trees on my job this week.

@statquest 2 года назад

bam!

@lautarocisterna3339 3 года назад

Statistics and ML GOAT

@statquest 3 года назад

BAM! And thank you for supporting me! :)

@aleksandartta 2 года назад

How to implement pipeline with cost complexity? Consider the marking part which start before 49:00... Thank in advance! You are the best teacher...

@statquest 2 года назад

I'll work on that

@adch2039 3 года назад

pls add Decision Tree implementation in "R' also.

@statquest 3 года назад

I'll do that soon.

@mohammedalialbashar7921 2 года назад

I've some questions what's was your's methodology that's you used, what's is interperet, did you used any Descriptive Statistical Analysis or Data Exploratory

@statquest 2 года назад

The answers are all in the video

@vipanpatial2243 2 года назад

BAM!! You are best.

@statquest 2 года назад

Thanks!

@ccuny1 4 года назад

Another hit for me. I will be getting the Jupyter notebook and some if not all of you study guides (I only just realised they existed).

@statquest 4 года назад

BAM! :) Thank you very much! :)

@claudiomarcio7579 3 года назад

nice Brasilian flag shirt

@statquest 3 года назад

Obrigado!!! :)

@_ahahahahaha9326 3 года назад

Really learn a lot from you

@statquest 3 года назад

Thanks!

@SaurabhKumar-mr7lx 4 года назад

Hi Josh, I see in Sklearn all the tree based ensembled algorithms has ccp_alpha as tuning parameter. Is it advisable to do so, rather is it feasible to do so for hundreds of trees (especially when trees are randomly created) or should we tune standard parameters like learning rate, no. of trees, loss function etc.

@statquest 4 года назад

In this video I tune ccp_alpha (starting at 46:31 ). It spares us the agony of tuning a lot of separate parameters.

@SaurabhKumar-mr7lx 4 года назад

@@statquest Just wondering is it possible to tune this for random forest. Since we are creating 100's of trees with randomly selected features for every tree. As far as I understood, ccp is a tree specific parameter. Please give some insight of this in your next session. Hope so my query is relevant 🙂

@statquest 4 года назад

@@SaurabhKumar-mr7lx With Random Forests, the goal for each tree is different than when we just want a single decision tree. For Random Forest trees, we actually do not want an optimal tree. We only want something that gets it correct a little more than 50% of the time. So in this case, we just limit the tree depth to 3 or 4 or something that, rather than optimize each tree with cost complexity pruning.

@SaurabhKumar-mr7lx 4 года назад

@@statquest got it ....... Thanks for explaining this Josh.

@xenofon939 3 года назад

Can we hyper tune the parameters(not only alpha) with gridsearch ? (extra cross val) maybe we can optimize the tree to work even better? So we can have the cross val for alpha, and we can add a gridsearch for max_leafs ,gini,entropy,samples Thanks in advance

@statquest 3 года назад

Sure

@munagalavenkatesh7166 2 года назад

BAM I like u r videos

@statquest 2 года назад

Thanks!

@abdelrazzaqabuhejleh6625 9 месяцев назад

Thank you for this valuable explanation :D I have a question tho, what do we learn from the graph in 51:48?

@statquest 9 месяцев назад

This shows how different trees trained with different subsets of data have different accuracies.

@rohit2761 3 года назад

Can someone provide me with code link >? I am financially restrained and trying to move into Data Science. Cannot afford to pay. Thanks and Regards (Love from India)

@statquest 3 года назад

Contact me through my website: statquest.org/

@mauriciolobo1642 2 года назад

Nice T-Shirt! 🇧🇷

@statquest 2 года назад

Muito obrigado!

@junbinlin6764 3 года назад

Why did you determine alphas by using "training data" rather than "full dataset"?? As I remember what you talked in the video of pruning regression tree, you found alphas by full data.

@statquest 3 года назад

I'm sorry that I was sloppy/imprecise with my terminology. "full data", I guess, refers to the full amount of data we are using to build the tree (and not some partition that we use for cross validation).