Тёмный

How do I encode categorical features using scikit-learn? 

Data School
Подписаться 242 тыс.
Просмотров 138 тыс.
50% 1

In order to include categorical features in your Machine Learning model, you have to encode them numerically using "dummy" or "one-hot" encoding. But how do you do this correctly using scikit-learn?
In this video, you'll learn how to use OneHotEncoder and ColumnTransformer to encode your categorical features and prepare your feature matrix in a single step. You'll also learn how to include this step within a Pipeline so that you can cross-validate your model and preprocessing steps simultaneously. Finally, you'll learn why you should use scikit-learn (rather than pandas) for preprocessing your dataset.
AGENDA:
0:00 Introduction
0:22 Why should you use a Pipeline?
2:30 Preview of the lesson
3:35 Loading and preparing a dataset
6:11 Cross-validating a simple model
10:00 Encoding categorical features with OneHotEncoder
15:01 Selecting columns for preprocessing with ColumnTransformer
19:00 Creating a two-step Pipeline
19:54 Cross-validating a Pipeline
21:44 Making predictions on new data
23:43 Recap of the lesson
24:50 Why should you use scikit-learn (rather than pandas) for preprocessing?
CODE FROM THIS VIDEO: github.com/justmarkham/scikit...
WANT TO JOIN MY NEXT LIVE WEBCAST? Become a member ($5/month):
/ dataschool
=== RELATED RESOURCES ===
OneHotEncoder documentation: scikit-learn.org/stable/modul...
ColumnTransformer documentation: scikit-learn.org/stable/modul...
Pipeline documentation: scikit-learn.org/stable/modul...
My video on cross-validation: • Selecting the best mod...
My video on grid search: • How to find the best m...
My lesson notebook on StandardScaler: nbviewer.jupyter.org/github/j...
=== WANT TO GET BETTER AT MACHINE LEARNING? ===
1) WATCH my scikit-learn video series: • Machine learning in Py...
2) SUBSCRIBE for more videos: ru-vid.com?su...
3) ENROLL in my Machine Learning course: www.dataschool.io/learn/
4) LET'S CONNECT!
- Newsletter: www.dataschool.io/subscribe/
- Twitter: / justmarkham
- Facebook: / datascienceschool
- LinkedIn: / justmarkham

Опубликовано:

 

30 июл 2024

Поделиться:

Ссылка:

Скачать:

Готовим ссылку...

Добавить в:

Мой плейлист
Посмотреть позже
Комментарии : 452   
@dataschool
@dataschool 4 года назад
*Are you new to Machine Learning?* Watch my video series, "Introduction to Machine Learning in Python with scikit-learn": ru-vid.com/group/PL5-da3qGB5ICeMbQuqbbCOQWcS6OYBr5A
@arunjohn492
@arunjohn492 4 года назад
Sir what about dummy variable trap , When we use Column Transformer ?
@dataschool
@dataschool 3 года назад
Great question! See this video: ru-vid.com/video/%D0%B2%D0%B8%D0%B4%D0%B5%D0%BE-NYtwyvyvDEk.html
@terryhenyo9216
@terryhenyo9216 4 года назад
The Legendary Data Science guy is back!
@dataschool
@dataschool 4 года назад
Thank you for the warm welcome! 😄
@GoredGored
@GoredGored 2 года назад
For beginners: When I tried to complete an ML project of say a simple model based on Logistic or Linear regression it used to take me about a month. As I was a beginner in Python, Pandas, SQL and the rest of it, I thought this will take me a long time to master and may be I am a late comer into this. But a year forward now and thanks to Data School, Sentdex, Krish naik, Statquest, Thinkful Webinar and more I am surprised that all I need is a day or less to complete these projects. Because of the meticulous analysis on Data School when I needed a deeper understanding that's where my gps leads me to. Thank you Data School.
@dataschool
@dataschool 2 года назад
You are so very welcome!
@altunbikubra
@altunbikubra 3 года назад
Your guideline does not only involves basic codes, but it actually involves very practical and useful functions. I want to sincerely thank you for your effort!
@dataschool
@dataschool 3 года назад
Thanks very much for your kind words!
@liquid_absabs1334
@liquid_absabs1334 3 года назад
There is something about your explanations, that i just get it instantly. You deserve an award
@dataschool
@dataschool 3 года назад
You are too kind, thank you!
@dataschool
@dataschool 3 года назад
Yes, that is the role of the OneHotEncoder.
@hieungotrung5411
@hieungotrung5411 4 года назад
OMG!!! I’ve just started ML in kaggle for the past few weeks. Theres a lot of information to absorb but you teach us in the most understandable way and yet up-to-date question why we should use scikit instead of using dummies. This video is extremely helpful and informative. Thank you alot!!! Guess I gonna spend the rest of the day to watch all of your videos
@dataschool
@dataschool 4 года назад
Awesome! Glad to hear this was helpful to you 👍
@sandeep1026
@sandeep1026 4 года назад
I feel fortunate that I stumbled across this video. Very well articulated. Slows down pace, so that folks can hear, understand and digest. Most videos I come across, seem to rush through the contet before one can digest. Thanks for taking time and sharing your knowledge
@dataschool
@dataschool 3 года назад
Thanks very much for your kind words! 🙏
@Anarchy977
@Anarchy977 4 года назад
Fantastic tutorial! Great teacher, best Machine Learning teacher on youtube! Thank you!
@dataschool
@dataschool 4 года назад
Thanks so much!
@amitsharma8337
@amitsharma8337 4 года назад
THANK YOU for this tutorial! Was wandering around the web to solve unexpected errors that came by following, apparently, outdated tutorials. If I have landed up on this tutorial the very first time, it would have saved me around 4 hours of useless surfing. Thanks again
@dataschool
@dataschool 4 года назад
That's awesome to hear... glad I could be of help! By the way, I'll be launching a full course covering these topics (and more)... sign up here to get notified when it launches: scikit-learn.tips
@christianiheanacho4976
@christianiheanacho4976 4 года назад
You are a high quality TEACHER , thank you very much.
@dataschool
@dataschool 4 года назад
You are very welcome! 😄
@Freethinker33
@Freethinker33 Год назад
I was looking for clear explanation of Pipeline for a long time. You nailed it. Crystal clear explanation and understood by watching one time. Thank you.
@dataschool
@dataschool Год назад
You're so very welcome! 🙏
@tald747
@tald747 3 года назад
This is an excellent and simple explanation of this topic. I must say that you are a very talented in the way you teach! You choose your words in a way that emphasizes only the important and relevant staff. Thanks!!!
@dataschool
@dataschool 3 года назад
Wow, thank you!
@fet1612
@fet1612 3 года назад
00:58 1) It allows you to properly cross-validate a process rather than just a model. In other words, when you are doing cross-validation like cross_val_score, normally you just pass a model to it. Well, there are cases when that is not going to give you accurate results because you're doing the preprocessing outside of the cross-validation. So a pipeline, generally speaking, is useful because you can cross-validate a process that includes (a) *preprocessing* as well as (b) *model building*.
@420nyk
@420nyk 2 года назад
Thanks, this helps a lot. Was scratching my head on pipeline and column transformer before this video. Also you got a very soothing voice and it helps to relax and really enjoy the learning.
@dataschool
@dataschool 2 года назад
Great to hear!
@Putinka1000
@Putinka1000 4 года назад
Thank you for speaking slowly. It’s nice to listen to a non-English speaking person
@dataschool
@dataschool 4 года назад
You're very welcome! :)
@salonisamant5410
@salonisamant5410 3 года назад
Thank you for explaining the pipeline approach so well!
@dataschool
@dataschool 3 года назад
You're very welcome!
@Mehrdadkh87
@Mehrdadkh87 2 года назад
Thx kevin, one of best & simplest explanations of pipeline
@dataschool
@dataschool 2 года назад
Glad it was helpful!
@NoWhiteGullibility
@NoWhiteGullibility 4 года назад
Perfect timing, was just searching on pipelines the other day. Would be great to follow-up by tacking on Gridsearch in this context.
@dataschool
@dataschool 4 года назад
That's awesome to hear! I will definitely cover grid search of a pipeline at some point - thanks for the suggestion!
@artyb3115
@artyb3115 4 года назад
Absolutely perfect and useful lessons! Thinking of becoming a patron member as I get a little more confident with ML
@dataschool
@dataschool 4 года назад
That would be awesome, thank you so much! You can join here: www.patreon.com/dataschool
@horoshuhin
@horoshuhin 2 года назад
thank you Kevin, very thorough explanation. I'm glad I found your channel. I like the way you teach.
@dataschool
@dataschool 2 года назад
Thank you so much! 🙏 That's great to hear!
@nishantchaudhary7528
@nishantchaudhary7528 2 года назад
That was really something amazingly explained, I was looking for all these topics to understand. I got it in one go. Thanks a ton.
@dataschool
@dataschool 2 года назад
You're very welcome!
@rommeltito123
@rommeltito123 4 года назад
Dayyyyuuummmm.......why did I not stumble upon ur videos earlier ????!!!!!!
@dataschool
@dataschool 3 года назад
😄
@sophiar5280
@sophiar5280 4 года назад
Always love your step by step, clear lessons. Keep it coming.
@dataschool
@dataschool 3 года назад
Thank you!
@harshitarawat8941
@harshitarawat8941 3 года назад
Man I love you. I just love you. I love your videos. I love the way you explain things. I love the pace of you videos. I love everything. Thank you.
@dataschool
@dataschool 3 года назад
Thank you so much, Harshita! 🙏
@dhananjaykansal8097
@dhananjaykansal8097 4 года назад
Nice to have u back sir. This session was so fruitful. Thanks a ton. Keep it up!
@dataschool
@dataschool 4 года назад
That's awesome to hear!
@jkore2554
@jkore2554 3 года назад
Thank you for this tutorial. I was working with logistic regression this week and was trying to figure out how to one hot encode for a categorical variable with hundreds of categories. I was getting 100% accuracy and precision so something wasn’t right. I’m going to try the steps that you outlined in this tutorial. Thanks.
@dataschool
@dataschool 3 года назад
Good luck!
@fahadkhankhattak8339
@fahadkhankhattak8339 2 года назад
thank you so much!!!!! it was very helpful. yours is the only channel i come running to for help whenever im stuck somewhere. rich conent!! keep sharing these wonderful thingss
@dataschool
@dataschool 2 года назад
Thank you so much!
@jatinshetty
@jatinshetty 4 года назад
yo! Mind blown with the amount of things i learnt from this. Please keep at it!
@dataschool
@dataschool 4 года назад
Thank you! You might like my scikit-learn tips: github.com/justmarkham/scikit-learn-tips
@chr1112
@chr1112 3 года назад
you are the best tutor i have ever met , keep up the good work. Thank you
@dataschool
@dataschool 3 года назад
Wow, thanks!
@sanaullahkhanhassanzai8432
@sanaullahkhanhassanzai8432 4 года назад
Thank you very much and welcome back after a long time. You are as good as gets when it comes to Machine Learning. You have made me learn a lot. I cant wait for videos on deep learning. I hope you ll come up with deep learning soon. Thanks again
@dataschool
@dataschool 4 года назад
Thanks very much for your kind words, and for your suggestion as well!
@harshalkulkarni511
@harshalkulkarni511 4 года назад
Preprocessing with pipeline was complex topic to understand for me before watching this video. Thanks a lot for the video.
@dataschool
@dataschool 4 года назад
You're very welcome! Glad it helped 👍
@PaulBillingtonFW
@PaulBillingtonFW 11 месяцев назад
Thanks, for this clear and well paced tutorial.
@dataschool
@dataschool 9 месяцев назад
Glad it was helpful!
@amitblizer4567
@amitblizer4567 Год назад
Very clearly explained and helpful video - Thank you!
@dataschool
@dataschool Год назад
Glad it was helpful!
@aaqibsoomro5776
@aaqibsoomro5776 4 года назад
You are a great teacher. Please make the tutorials or series for Data Visualization, In-Depth Data Analysis, and Cleaning, and Project Deployment, etc. Since after Learning Python and its libraries and ML, these are the next steps.
@dataschool
@dataschool 4 года назад
I have many more tutorials! Many of them are listed here: www.dataschool.io/launch-your-data-science-career-with-python/
@salakkal
@salakkal 4 года назад
Really great that you did a video like this . It just helped me a lot and I am really thankful for it brother . Keep going .
@dataschool
@dataschool 3 года назад
Thanks!
@quocanhhbui8271
@quocanhhbui8271 2 года назад
My god I love your detailed solution. Even my 5yo sibling can understand it. Wonderful. Definitely worth a subscribe.
@dataschool
@dataschool 2 года назад
Awesome! 🙌
@georgeognyanov
@georgeognyanov 3 года назад
God damn this video is good. I was struggling with column_transformer and pipelines till late last night. The options you suggest here are so much better and easier to understand for me. I am totally going through your "Introduction to Machine Learning in Python with scikit-learn" playlist soon. Thanks for putting this out!
@dataschool
@dataschool 3 года назад
You're very welcome! If you want to go deeper into this topic, you may want to check out my course: courses.dataschool.io/building-an-effective-machine-learning-workflow-with-scikit-learn
@ayyappahemanth7134
@ayyappahemanth7134 4 года назад
Oh my god! after so much of exhaustive waiting another video came, which is far more useful than others for me! I just love your videos, the content was really useful in my real life, most of the youtube channels they just take the ideal ones which I might not encounter in my whole life! please do these videos regularly!
@dataschool
@dataschool 4 года назад
That is awesome to hear, thanks so much for your kind words! 🙏 Actually, I publish a new Q&A video every month for Data School Insiders at the $5 level: www.patreon.com/dataschool
@sandeeppreetam
@sandeeppreetam 4 года назад
Thank you good sir, this tutorial was better than many paid tutorials on Udemy. Blessed!
@dataschool
@dataschool 3 года назад
Glad it was helpful! 🙌
@abdelkaderkaouane1944
@abdelkaderkaouane1944 Год назад
Your explanation is very clear, thank you very much
@dataschool
@dataschool 9 месяцев назад
You're welcome!
@krishkonnect814
@krishkonnect814 4 года назад
I just found solution to my problem after watching your video. Thanks a lot.
@dataschool
@dataschool 3 года назад
You're welcome!
@frankgiardina205
@frankgiardina205 3 года назад
Excellent! I was using the pandas dummies and your explanation of why pipeline and ohe is a better solution solves all the problems. thanks again
@dataschool
@dataschool 3 года назад
Glad it helped!
@jobihara
@jobihara 2 года назад
Thankyou dataschool, it was not only helpful, it was great, enlightening and awesome.
@dataschool
@dataschool 2 года назад
What a nice thing to say, thank you so much! 🙏
@honprarules
@honprarules 4 года назад
Amazing explanation, as always!
@dataschool
@dataschool 3 года назад
Thank you!
@lovejazzbass
@lovejazzbass 3 года назад
Kevin, it's 5:20am Winston-Salem time and I am digging this. I was very confused. Thank you so much.
@dataschool
@dataschool 3 года назад
Excellent!
@brandonbermudez9047
@brandonbermudez9047 Год назад
Absolute goat bruh, really thankful for your content
@dataschool
@dataschool Год назад
Thank you!
@aimenbaig6201
@aimenbaig6201 3 года назад
i just discovered your channel and i gotta tell you , you got a permanent subscriber here!!! LOVE YOUR TEACHING STYLE!!!!!!!!!!!!!!!
@dataschool
@dataschool 3 года назад
Thank you! 🙏
@adarshr30
@adarshr30 4 года назад
After searching alot, i found this channel n i feel its best for me:)
@dataschool
@dataschool 3 года назад
Happy to hear that!
@asimssheikh
@asimssheikh 3 года назад
Impressive explanation, and logical approach to material presentation. You just got a new sub.
@dataschool
@dataschool 3 года назад
Welcome aboard!
@xinchenzou4558
@xinchenzou4558 2 года назад
Thank you sir! You've really saved my life...
@dataschool
@dataschool 2 года назад
🙌
@David-fr7ee
@David-fr7ee 4 года назад
Great content, i am learning this in my college data science class. You did better than my professor!
@CE-vd2px
@CE-vd2px 3 года назад
Are you undergrad or grad?
@dataschool
@dataschool 3 года назад
Thank you! 🙏
@gyanendergandhar
@gyanendergandhar 2 года назад
Thanks alot for this tutorial Kevin. It really saved me😅
@dataschool
@dataschool 2 года назад
Glad to hear that!
@trentjones6468
@trentjones6468 4 года назад
Amazing video. You are an excellent instructor. Got yourself a new subscriber :)
@dataschool
@dataschool 4 года назад
Thank you so much!
@SaunakDey
@SaunakDey 3 года назад
awesome explanation!! Thanks a lot
@dataschool
@dataschool 3 года назад
You're very welcome!
@kishanlal676
@kishanlal676 4 года назад
Thank you for this amazing video. Please do some videos on feature selection and scaling techniques in python!
@dataschool
@dataschool 4 года назад
I'm hoping to cover feature scaling in a future video, but I do have a video about feature selection: ru-vid.com/video/%D0%B2%D0%B8%D0%B4%D0%B5%D0%BE-YaKMeAlHgqQ.html Hope that helps!
@ramleo1461
@ramleo1461 4 года назад
Hi, this will be very helpful.. Thank you for making this video!!
@dataschool
@dataschool 4 года назад
You are very welcome! 🙌
@Steven-se5jd
@Steven-se5jd 4 года назад
just want to say thank you. I am a beginner and you teach much better than my professor.
@dataschool
@dataschool 4 года назад
Glad to hear I have been helpful! 🙏
@eugenechew1476
@eugenechew1476 4 года назад
Why pay $900 at Uni when you can watch this amazing tutorial for free, and its wayyyy better!
@dataschool
@dataschool 4 года назад
Thanks! Stay tuned for a course that explores these topics is much more detail...
@83vbond
@83vbond 3 года назад
I paid $6000 :((
@JainmiahSk
@JainmiahSk 4 года назад
Sir, just before 5 minutes I visited our channel to ask you the same question where it was difficult for me to encode multivariables in kaggles house prediction using advanced regression dataset. Fortunately and surprisingly you posted same. Thank you so much.
@dataschool
@dataschool 4 года назад
That's amazing! 🙌 I hope this video is helpful to you, and let me know if you have any questions!
@JainmiahSk
@JainmiahSk 4 года назад
@@dataschool I have a problem with functions, I can't write custom functions in Python which is very important what to do sir?
@dataschool
@dataschool 4 года назад
@@JainmiahSk You can definitely write custom functions in Python!
@sowash2020
@sowash2020 Год назад
You just gained another subscriber...this was super useful
@dataschool
@dataschool Год назад
Great to hear!
@Takk6
@Takk6 4 года назад
You are by far the best data science teacher on youtube. Can you make a video on creating your own custom transformers using it to modify your data, then using that custom transformer in a ColumnTransformer and a Pipeline?
@dataschool
@dataschool 4 года назад
Thanks for your suggestion! I'm working on a course that will likely cover that topic. Sign up here to get notified when it launches: scikit-learn.tips
@barulli87
@barulli87 4 года назад
MIND BLOWN!!!! CV FOR A PROCESS!!! NOICE ONE!!
@dataschool
@dataschool 3 года назад
🤯
@surfzion
@surfzion 3 года назад
Extremely helpful, thank you so much !!!
@dataschool
@dataschool 3 года назад
Glad it helped!
@1stophchr
@1stophchr 4 года назад
thank you very much, very clear video
@dataschool
@dataschool 4 года назад
You're very welcome! 😄
@hichamamchtkou7343
@hichamamchtkou7343 4 года назад
Thank you very much, it 's very interesting and by the way, it is exactly what i need in my current ML project.
@dataschool
@dataschool 4 года назад
That's great to hear! Good luck with your project 🙌
@hichamamchtkou7343
@hichamamchtkou7343 4 года назад
@@dataschool thanks 👍
@absar66
@absar66 4 года назад
Great ! Great ! Great! tutorial..many thanks Kevin
@dataschool
@dataschool 4 года назад
You're very welcome!
@yeahzisue
@yeahzisue 3 года назад
this is so helpful that I have to comment. great job. thanks a lot
@dataschool
@dataschool 3 года назад
Glad it was helpful!
@christianiheanacho4976
@christianiheanacho4976 4 года назад
I am enriched by this teaching.
@dataschool
@dataschool 4 года назад
Great to hear!
@victor-os9wq
@victor-os9wq 2 года назад
Thanks for such a detailed tutorial. I am working on a similar problem where I have multiple categorical features. In my dataset, the categorical variables has more than 90 possible values, as a result I am having an additional 121 columns when i use the Get.dummy, but I actually want just four levels. Please kindly advise me.
@brendensong8000
@brendensong8000 3 года назад
I love it! Amazing tips!
@dataschool
@dataschool 3 года назад
Thank you!
@MohammadrezaMokhtari-qh2yg
@MohammadrezaMokhtari-qh2yg 2 месяца назад
amazing information. wow! thank you so much man.
@dataschool
@dataschool 2 месяца назад
You're very welcome!
@gisleberge4363
@gisleberge4363 Год назад
Great example, educational.
@dataschool
@dataschool Год назад
Thank you!
@vincecarter7500
@vincecarter7500 4 года назад
thanks a lot for helping everyone out, was just wondering if you will be uploading more videos in the future
@dataschool
@dataschool 3 года назад
Yes! I just started posting again last week. Thanks for watching!
@Pqj613
@Pqj613 Год назад
It's a good tutorial for some reasons that you will explain later.:D
@anthonyhan6825
@anthonyhan6825 3 года назад
Awesome job!
@dataschool
@dataschool 3 года назад
Thanks!
@eatbreathedatascience9593
@eatbreathedatascience9593 2 года назад
This video is excellent.
@dataschool
@dataschool 2 года назад
Thank you!
@zohrehvahdati787
@zohrehvahdati787 4 года назад
Thank you so much.😍😍🙏🙏👍👍 It helped me a lot.
@dataschool
@dataschool 4 года назад
Great to hear!
@pivotai525
@pivotai525 2 года назад
Simply the best!!
@dataschool
@dataschool 2 года назад
Thank you!
@AjayVerma-xi2us
@AjayVerma-xi2us 4 года назад
Very good, it cleared my many doubts
@dataschool
@dataschool 4 года назад
Great to hear!
@user-bt8ln1pp1k
@user-bt8ln1pp1k 7 месяцев назад
Very Well Explained..
@dataschool
@dataschool 7 месяцев назад
Thank you!
@oeb5542
@oeb5542 4 года назад
Just another amazing video. 😄
@dataschool
@dataschool 4 года назад
Thank you so much for your kind words! 😊
@krishnaprasadbhat851
@krishnaprasadbhat851 3 года назад
mkayyyyy, awesome tutorial!!!
@dataschool
@dataschool 3 года назад
Thank you!!
@TheAstralftw
@TheAstralftw 3 года назад
Finally someone explained me properly what is columns transformer and why we use pipeline. I would like you to put your course to udemy , then i ll buy it 100% .. maybe on average you will sell each course for less price, but trust me, you are explaining this so good, you can sell tens of thousands of courses in few months , ... or in the case you have this on udemy , please provide me with the link!
@dataschool
@dataschool 3 года назад
Thanks for your kind words and your suggestion! I know that many students like Udemy courses, but my values as a course creator don't align with their business model, and so I'm not currently interested in publishing a course there. I prefer to offer courses directly to interested students. Thanks for understanding!
@nguyenminhoan7882
@nguyenminhoan7882 4 года назад
thanks you, waiting for more tutorials :3
@dataschool
@dataschool 4 года назад
You're very welcome! I will do my best to publish more!
@abdoulayebalde2139
@abdoulayebalde2139 4 года назад
A very nice video that save my life I can see it is well explained keep uploading
@dataschool
@dataschool 3 года назад
Thanks!
@emilezg4496
@emilezg4496 2 года назад
great content thank you sir
@dataschool
@dataschool 2 года назад
Thank you!
@schuylerblasy2192
@schuylerblasy2192 4 года назад
This is a really interesting video. Column_transformer is sort of like a pipeline in itself. Kind of reminds me of vectotassembler in Spark/Pyspark.
@dataschool
@dataschool 4 года назад
Thanks Sky! One important difference is that ColumnTransformer stacks results side-by-side, whereas Pipeline feeds the output of one step to the input of the next step.
@nowhere5111
@nowhere5111 3 года назад
This video helps a lot👍👍👍
@dataschool
@dataschool 3 года назад
Great!
@andrecouto2344
@andrecouto2344 3 года назад
Thank you very much man
@dataschool
@dataschool 3 года назад
You're welcome!
@IgnitedMountain
@IgnitedMountain 2 года назад
Hello, in the last example. How is the NAN values handled. Are they removed by one of the methods or do you have to remove them by yourself?
@joxa6119
@joxa6119 2 года назад
God this video answered my month unsolved question. God blessed you.
@dataschool
@dataschool 2 года назад
Great to hear!
@modhua4497
@modhua4497 10 месяцев назад
Thanks Kevin, do you have any video example that shows how to incorporate a self defined function in pandas pipeline?
@sihlengena5022
@sihlengena5022 3 года назад
Simply the best.
@dataschool
@dataschool 3 года назад
Thank you!
@TheAdrianPardo
@TheAdrianPardo 4 года назад
Thank you so much! You're the best! Please go over scaling when you have a chance :) Question: Is is ok to leave in all of the OneHotEncoded columns with this pipe approach? I believe you previously mentioned how it's best to drop one of the columns to prevent multicollinearity. Any way to do this within the pipe?
@dataschool
@dataschool 4 года назад
You are so kind, thank you! 😊 Yes, I plan to cover StandardScaler at some point. Yes, it is okay to leave in all of the one-hot encoded columns. However, the "drop" parameter for OneHotEncoder (new in scikit-learn 0.21) does allow you to drop one feature per category. Hope that helps!
@ramleo1461
@ramleo1461 4 года назад
Even I had the same doubt... Thank you for clarifying 😊
@Narriz
@Narriz Год назад
This is amazing.
@dataschool
@dataschool Год назад
Thank you! You might be interested in this course: courses.dataschool.io/building-an-effective-machine-learning-workflow-with-scikit-learn
@12345shipreck
@12345shipreck 3 года назад
You are 100x better than my ML course teacher at uni. GG bro.
@dataschool
@dataschool 3 года назад
Thank you! 😄
@salseid1033
@salseid1033 4 года назад
Your tutorial is informative as always. May you prepare a tutorial how to interprete model. Like 'Black Box' interpretation in RF. Thank you.
@dataschool
@dataschool 4 года назад
Thanks for your suggestion! I'll consider it for the future!
@prithasinha5378
@prithasinha5378 4 года назад
This is a great video! Thank you. Will you be showing how to do parameter tuning with pipeline?
@dataschool
@dataschool 3 года назад
Yes, I actually cover that in one of my courses: courses.dataschool.io/building-an-effective-machine-learning-workflow-with-scikit-learn
@deweihu1003
@deweihu1003 4 года назад
Nice video!
@dataschool
@dataschool 3 года назад
Thanks!
@adityakharwade9501
@adityakharwade9501 3 года назад
Awesome video and thank you for this explanation!!! I have one request could you please make video on PCA
@dataschool
@dataschool 3 года назад
Thanks for your suggestion!
Далее
Machine Learning with Text in scikit-learn (PyCon 2016)
2:40:15
Fast and Furious: New Zealand 🚗
00:29
Просмотров 33 млн
25 Nooby Pandas Coding Mistakes You Should NEVER make.
11:30
How do I use the MultiIndex in pandas?
25:01
Просмотров 173 тыс.
How to find the best model parameters in scikit-learn
27:46
How I'd Learn AI (If I Had to Start Over)
15:04
Просмотров 760 тыс.
I gave 127 interviews. Top 5 Algorithms they asked me.
8:36
How I'd Learn to be a Data Analyst in 2024
13:17
Просмотров 258 тыс.