Тёмный

Handling Categorical Data in Machine Learning: Easy Explanation for Data Science Interviews 

Emma Ding
Подписаться 56 тыс.
Просмотров 6 тыс.
50% 1

Handling categorical data in machine learning projects is a very common topic in data science interviews. In this video, I’ll cover the difference between treating a variable as a dummy variable vs. a non-dummy variable, how you can deal with categorical features when the number of levels is very large, and the pros and cons of various strategies.
Feature hashing
en.wikipedia.org/wiki/Feature...
🟢Get all my free data science interview resources
www.emmading.com/resources
🟡 Product Case Interview Cheatsheet www.emmading.com/product-case...
🟠 Statistics Interview Cheatsheet www.emmading.com/statistics-i...
🟣 Behavioral Interview Cheatsheet www.emmading.com/behavioral-i...
🔵 Data Science Resume Checklist www.emmading.com/data-science...
✅ We work with Experienced Data Scientists to help them land their next dream jobs. Apply now: www.emmading.com/coaching
// Comment
Got any questions? Something to add?
Write a comment below to chat.
// Let's connect on LinkedIn:
/ emmading001
====================
Contents of this video:
====================
00:00 Introduction
00:48 Categorical Data
02:22 Ordinal Features & Class Labels
03:38 One-Hot Encoding
05:32 Dummy Encoding
06:30 Problems of One-Hot & Dummy Encoding
07:26 Feature Hashing

Опубликовано:

 

3 авг 2024

Поделиться:

Ссылка:

Скачать:

Готовим ссылку...

Добавить в:

Мой плейлист
Посмотреть позже
Комментарии : 16   
@emma_ding
@emma_ding Год назад
Many of you have asked me to share my presentation notes, and now… I have them for you! Download all the PDFs of my Notion pages at www.emmading.com/get-all-my-free-resources. Enjoy!
@linghaoyi
@linghaoyi Год назад
Thank you. Merry Christmas and Happy New Year!
@qingxiawang161
@qingxiawang161 Год назад
Hi, Emma, thank you very much for the informative video, I really learned a lot from it! Keep up the good work❤
@junlizhou7167
@junlizhou7167 Год назад
Thanks for the informative video Emma! Love the Notion notes you created
@emma_ding
@emma_ding Год назад
So glad you enjoyed it! Thank you for watching. 😊
@nitishjambhurkar7990
@nitishjambhurkar7990 Год назад
Hi Emma, thank you soo much for this insight. Addition to this i also want to know how to handle large datasets like very large datasets because i was asked in an interview but i was unable to answer it correctly. So wanted to know from you how to handle very huge datasets and how to load ? what steps you would take to load these datasets. If you can make one video on this topic that would be great.
@hsuya3925
@hsuya3925 Год назад
Hi Emma, very informative video. Thanks for working on all these types of videos and sharing with us. Wanted to know is your notion page public? or can you share if possible.
@Doctor_monk
@Doctor_monk Год назад
I have been waitiing for these as well. :)
@emma_ding
@emma_ding Год назад
Of course! I'm working on getting all notes organized and sharable in one location, will let you know as soon as they are ready! :)
@emma_ding
@emma_ding Год назад
@sukumargv @hsuya3925 Here you go! You can now download all the PDFs of my Notion pages at www.emmading.com/get-all-my-free-resources. Enjoy!
@jet3111
@jet3111 Год назад
Hi Emma, thank you for the very informative video. It would be great to discuss embedding methods for handling categorical data.
@emma_ding
@emma_ding Год назад
Great suggestion! I've added it to my list of content ideas. 😊 Thanks for watching!
@saudiorchestra6443
@saudiorchestra6443 11 месяцев назад
How do we deal with a category that appears for the first time in the test data? For examples, I the training data I have a column for the jobs. The training data contains these jobs: Doctor, Nurse, Lab technician, Administrator I used one hot encoding for the job column. What if the test data has an additional job Surgeon? How do we handle this situation?
@rakeshkumarsharma2250
@rakeshkumarsharma2250 Год назад
How I convert pincode /postal code
@sruthimallarapu7662
@sruthimallarapu7662 Год назад
Hi Emma, Can decision trees handle string categorical values (For example "gender" column takes "M" or "F"). Is it not necessary to convert the strings to numericals?
@georgezevallos
@georgezevallos 7 месяцев назад
All ML algorithms require to convert the strings into numerical values. Even NLP does it. Hope it helps.
Далее
Китайка и Шрек в Домике😂😆
00:20
I2ML - Random Forest - Feature Importance
8:28
Просмотров 21 тыс.
Python Tutorial: Dealing with categorical features
5:06
Handling categorical data
11:13
Просмотров 10 тыс.