Тёмный

Featuring Engineering- Handle Categorical Features Many Categories(Count/Frequency Encoding) 

Krish Naik
Подписаться 985 тыс.
Просмотров 85 тыс.
50% 1

In this video we will be discussing about how to Handle Categorical Features using Count or Frequency Encoding.
github.com/krishnaik06/Comple...
If you like music support my brother's channel
/ @ultralifeproject
Support me in Patreon: / 2340909
Buy the Best book of Machine Learning, Deep Learning with python sklearn and tensorflow from below
amazon url:
www.amazon.in/Hands-Machine-L...
You can buy my book on Finance with Machine Learning and Deep Learning from the below url
amazon url: www.amazon.in/Hands-Python-Fi...
Connect with me here:
Twitter: / krishnaik06
Facebook: / krishnaik06
instagram: / krishnaik06
Subscribe my unboxing Channel
/ @krishnaikhindi
Below are the various playlist created on ML,Data Science and Deep Learning. Please subscribe and support the channel. Happy Learning!
Deep Learning Playlist: • Tutorial 1- Introducti...
Data Science Projects playlist: • Generative Adversarial...
NLP playlist: • Natural Language Proce...
Statistics Playlist: • Population vs Sample i...
Feature Engineering playlist: • Feature Engineering in...
Computer Vision playlist: • OpenCV Installation | ...
Data Science Interview Question playlist: • Complete Life Cycle of...
You can buy my book on Finance with Machine Learning and Deep Learning from the below url
amazon url: www.amazon.in/Hands-Python-Fi...
🙏🙏🙏🙏🙏🙏🙏🙏
YOU JUST NEED TO DO
3 THINGS to support my channel
LIKE
SHARE
&
SUBSCRIBE
TO MY RU-vid CHANNEL

Опубликовано:

 

11 сен 2019

Поделиться:

Ссылка:

Скачать:

Готовим ссылку...

Добавить в:

Мой плейлист
Посмотреть позже
Комментарии : 51   
@omkarr8282
@omkarr8282 4 года назад
Thanks for sharing your research ideas... whenever I feel I am comfortable with knowing some type of concept in DS... .every video of yours adds an interesting additional perspective to my knowledge...thanks for taking the efforts to share everything that you do😀
@muhammadzubairbaloch3224
@muhammadzubairbaloch3224 4 года назад
Sir I am very impressed from mode of teaching. Easily understandable. Sir please some high level computer vision based video lectures. Thanks
@shriyzfr15
@shriyzfr15 4 месяца назад
Was very useful Krish. More powers to you. Your work is very valuable for Data Science aspirants !!!
@shashwatbilgrami1796
@shashwatbilgrami1796 4 года назад
Thank you for the videos they help a lot.
@hazhir3861
@hazhir3861 Год назад
dude, you are amazing. well done.
@srishtikumari6664
@srishtikumari6664 3 года назад
Thanks for sharing this video! I was looking for this type of code for replacement of column with it's count column. I am learning feature engineering from kaggle couses. There, count_encoder() is used and I was trying to write code for steps used in the count_encoder() which I find in this video. I have a doubt, the column in that tutorial was numerical and both the column (col and count_column) were used in predicting output and calculating the validation score. I calculated validation score using both the ways (i.e.included both col & count_col and then only count_col), I found score to be higher in the model using both(col & count_col). Can you please clarify, should we use the original column as well if it's numerical ?
@traptigupta2628
@traptigupta2628 3 года назад
so helpful 😁
@balajivarma83
@balajivarma83 4 года назад
Hi Sir, Could you please complete your kaggle competition code. Waiting for next part. Thank so much you are best 👍💯
@sachinborgave8094
@sachinborgave8094 4 года назад
Super sir
@raseluddin
@raseluddin 3 года назад
Best tutorial
@sandeepnataraj2430
@sandeepnataraj2430 3 года назад
Thanks Krish, for sharing your knowledge. I want to know how to encode multi select categorical variables in Python. For example there is a field "Languages Known' which can hold multiple values like English, Kannada, Hindi or French, Telugu, English or Malayalam, Tamil, English etc. and assume the overall choice of unique languages is around 25.
@meharajbeguma7493
@meharajbeguma7493 4 года назад
Hi, Sir, Can you explain how to do encoding for the target feature with multiple labels for each instance separated by comma?
@VinayKumar-hy6ee
@VinayKumar-hy6ee 4 года назад
Hi can you upload a video explaining lightGBM catBoost with simple examples
@ogbeidenelson8843
@ogbeidenelson8843 4 года назад
Can we have a video on hypothesis testing and implementation in real world problem with python
@MyName-ur3ir
@MyName-ur3ir 3 года назад
Sir, may you please recommend ML algorithms which can be effectively applied with this technique? Pros and cons?
@devanshgoel9070
@devanshgoel9070 2 года назад
thank you sir
@venkateshsadagopan2505
@venkateshsadagopan2505 4 года назад
Hi krish, If we have a very large dataset with less features i.e number of features very very less compared to number of samples in the dataset, how can I approach the problem and what techniques i can apply to get reasonable result and how to avoid problems like over fitting in this case. Thanks in Advance
@GauravPadawe
@GauravPadawe 4 года назад
Do make video on Complete Hypothesis Testing
@sandipansarkar9211
@sandipansarkar9211 2 года назад
finished watching
@shashwatbilgrami1796
@shashwatbilgrami1796 4 года назад
Bro can you please perform feature engineering technique in air flight price problem
@niveditaparab6772
@niveditaparab6772 3 года назад
in this case how to interpreat result for prediction coefficent with respect to encoded feature ?
@sunnysavita9071
@sunnysavita9071 4 года назад
sir please make video on roc_auc
@AshutoshSinha1111
@AshutoshSinha1111 3 года назад
Thanks for sharing this informative lesson. I have a bank database and I need to identify categorical features in the table with column name Customer Age, Professional Experience, Annual Income, Family Size, CC Avg monthly spend, Education (1: Undergrad; 2: Graduate; 3: Advanced/Professional), Mortgage Value, Personal Loan (Yes/No), Securities Account (Yes/No), Credit Card (Yes/No). Can you help me with reasons for your selection.
@anandacharya9919
@anandacharya9919 4 года назад
Sir please explain where we use binomial distribution, poss ion dis, normal dis, anova test, one and two tail test in machine learning data science ?
@nitishsawant5893
@nitishsawant5893 4 года назад
Check Khan academy Statistics playlist
@Elsa.zoneNt
@Elsa.zoneNt Год назад
how about adding like .000000001 or 1 value to that dictionary for similar values? does that work?
@manjunathangadi457
@manjunathangadi457 4 года назад
where did you completed your data science course
@shaz-z506
@shaz-z506 4 года назад
Hi Krish, It's a good video, probably anyone knows about it, I haven't encountered this technique either, but I'm not sure how the distance-based algorithm will work and interpret the encoded categorical feature with this technique, please let me know if there are set of the algorithm to be used when performing regression or classification or this technique is applicable to all to all algorithm.
@vivekpuurkayastha1580
@vivekpuurkayastha1580 4 года назад
not algorithm dependent. Another advantage is does not need feature scaling.
@nivedithabaskaran1669
@nivedithabaskaran1669 Год назад
@@vivekpuurkayastha1580 Whether we scaling the numeric columns or not depends on what ML algorithm we choose right? Can you explain why feature scaling is not needed for this?
@thepresistence5935
@thepresistence5935 2 года назад
singam kadhal vandhalle kalla redum thannale 😁😂, ennakum romba pudikum
@kiranarun1868
@kiranarun1868 4 года назад
How is this different than LabelEncoding...even there it counts the unique label and assigns accordingly..here it counts the frequency nd assigns ...if both are same y use this??
@AshitDebdas
@AshitDebdas 3 года назад
similarly, i have IP_ADDRESS as a feature how can i encode them?
@sreeragsasidharan5192
@sreeragsasidharan5192 3 года назад
sir then what about categories which have similar count..???
@brightsides2881
@brightsides2881 4 года назад
How many hours do the data scientist work in a week in service based companies?
@someonesomebody716
@someonesomebody716 4 года назад
Based on google, around 10hrs
@arjyabasu1311
@arjyabasu1311 4 года назад
Sir is mean encoding and Frequency encoding same?? If not. please do make a video on it
@vivekpuurkayastha1580
@vivekpuurkayastha1580 4 года назад
Mean Encoding is a Feature Imputation technique. Whereas Frequency encoding is encoding technique used for encoding categorical Features, also called as response coding. Use the response-encoding Library available through pip. pip install response-encoding.
@mohammadarif8057
@mohammadarif8057 3 года назад
what if the two diff categories have the same count
@thealgorithm7633
@thealgorithm7633 Год назад
But it will create high weightage to the higher counts and model will overfit
@prabhatgupta9808
@prabhatgupta9808 3 года назад
Hello Sir, if the categories are not ordinal and, If we replace each categories with their counts in that column. aren't we making it ordinal as different different category will have different numeric value...? Can anyone please help me understand this?
@arpeggio7449
@arpeggio7449 6 месяцев назад
Yes, this is very similar to ordinal encoding, except with ordinal encoding we just give a sequention value to each unique value. So this is a bit different the effects are also different, using the count is very unpredictable, what if there are two or more values with the same count? The hen they will all have the same value resulting in loss of information. Also, just like with ordinal there is an effect of how important each value is, higher numbers often result in higher activations, we already have that with ordinal, but here the effects are even stronger. So i am really not sure if this is a good solution, my guess is it isn’t.
@akansha_bhatt28
@akansha_bhatt28 3 года назад
Can someone help me in explaining 2nd disadvantage of this algorithm?
@Rukshan918
@Rukshan918 3 года назад
Suppose you have some feature like blood pressure. This can category as Low, Medium, High. In Integer encoding, we categorize this as 1(Low), 2(Medium), 3(High). So ML model can understand Low < medium < high. This is what weights means. It is valuable in prediction. In One hot encoding we just put some meaningless number, they have no weights. I think this is what it means. I had same problem & searched. This is what I understood. Someone correct me if I'm wrong.
@anilsunny4443
@anilsunny4443 3 года назад
I thought we were to consider the top 10 most frequent labels and perform one hot encoding on them, but this is completely different :/. Somebody pls help, I'm a beginner at this and some lead on this would be greatly appreciated.
@TheSecretAgentRayan
@TheSecretAgentRayan 3 года назад
they're 2 different methods. You can pick any or maybe even try both seperately.
@dipk.mishra
@dipk.mishra 4 года назад
Sir can you share that feature engineering zip file again pls !
@nibinjoseph2136
@nibinjoseph2136 3 года назад
sent me your mail id bro
@vinothkumarselvaraj185
@vinothkumarselvaraj185 4 года назад
Friends, What if we have a column (say "Location") which consists of more than 1000 categorical variables?? FYI, this column is an independent variable and one of the important parameter for predicting the label. Answer pls.... Thanks in advance
@tikendraw
@tikendraw 2 года назад
sir, since we are just assigning count numbers to the categorical values and even it may lead to problematic situation if the counts are same , why don't we use label encoding in each columns , YEs it may not be the ordinal data but it does better job then what this method that you are talking about is doing.the Goal is to assign numbers to strings . -- I may be wrong but according to the info you provided what i said should work too , and it is easier. CORRECT ME IF I AM WRONG @KRISH NAIK
@bongiwe_khongolo
@bongiwe_khongolo Год назад
J. John hip
Далее
ПАПА ГАМБУРГЕР
00:13
Просмотров 154 тыс.
Feature Engineering Secret From A Kaggle Grandmaster
22:23
I gave 127 interviews. Top 5 Algorithms they asked me.
8:36
Regression with categorical independent variables
7:09
Handling categorical data
11:13
Просмотров 10 тыс.