299 - Evaluating sklearn model using KFold cross validation in python

Подписаться 104 тыс.

Просмотров 8 тыс.

50% 1

Code generated in the video can be downloaded from here:
github.com/bnsreenu/python_fo...
Let us start by understanding the Binary classification using keras . This is the normal way most of us approach the problem of binary classification
using sklearn (SVM). In this example, we will split our data set the normal way into train and test groups.
We will then learn to divide data using K Fold splits.
We will iterate through each split to train and evaluate our model.
We will finally use the cross_val_score() function to perform the evaluation.
It takes the dataset and cross-validation configuration and returns a list of
scores calculated for each fold.
KFOLD is a model validation technique.
Cross-validation between multiple folds allows us to evaluate the model performance.
KFold library in sklearn provides train/test indices to split data in train/test sets. Splits dataset into k consecutive folds (without shuffling by default).
Each fold is then used once as a validation while the k - 1 remaining folds
form the training set.
Split method witin KFold generates indices to split data into training and test set. The split will divide the data into n_samples/n_splits groups.
One group is used for testing and the remaining data used for training.
All combinations of n_splits-1 will be used for cross validation.
Wisconsin breast cancer example
Dataset link: www.kaggle.com/datasets/uciml...

Наука

Опубликовано:

28 фев 2023

Ссылка:

Скачать:

Готовим ссылку...

Добавить в:

Мой плейлист

Посмотреть позже

Комментарии : 17

@Master_of_Chess_Shorts Год назад

You are one of the best data science teacher out there. Thanks for your good work and approach. You explain very well on a wide range of topics.

@newcooldiscoveries5711 Год назад

Been enjoying this KFold series. Looking forward to the next one. Thanks.

@caiyu538 Год назад

I used this module a lot during my work. thank for these great free libraries, it make data scientists easier. Most of work is to glue the data to these libraries.

@DmitriiTarakanov Год назад

Dear Sreeni, thank you so much for your work! Have a good one!

@11111653 Год назад

how to print roc curve for overall cross validation? i have been trying to print roc curve but it shows me error apparently because i got different counts of tprs/fprs on each fold that prevents the code from showing

@malithabasuri4491 Год назад

Hi, great video series. Can you start a video series about medical image processing and ML like 3D MRI processing, stopping leaky validations and etc. It would be really useful because there aren't many resources.

@guiomoff2438 Год назад

Before doing a crossvalidation, shoudn't you use a dimentionnality reduction technique to determine if all features are necessary for your training? Thanks by advance if you take the time to answer me!

@ajay0909 Год назад

Hi sir, i have been trying to implement video classification using CNN. All the content or tutorials out there are quite hard to implement or maybe I got used to your detailed explanation. Please do a tutorial on how to load video data. Thanks for all the high quality content.

@maheshmaskey4592 2 месяца назад

Good post. By the way, how do we select the best model after cross-validation? I am more interested in regression than classification. Have you tried using a multivariate polynomial regression model so that we could establish an empirical relation?

@Gingeey23 Год назад

Great video. Just to clarify, is the purpose of cross-validation to tune the hyperparameters of models on a variety of different train_test splits to avoid overfitting? Cheers!

@DigitalSreeni Год назад

Yes, the main purpose of cross-validation is to estimate the performance of a model on an independent dataset and to tune the hyperparameters of the model to avoid overfitting.

@Athens1992 Год назад

nice video, one silly question u are using in a pipeline minmaxScaler how does know the cross_val_score to apply minmax_score on X_array? I know it's silly question about I have the question because u don't transform your pipeline to X_array