Machine Learning Tutorial Python - 6: Dummy Variables & One Hot Encoding

codebasics

Подписаться 1,1 млн

Просмотров 397 тыс.

50% 1

Видео Поделиться Скачать Добавить в

Опубликовано:

24 сен 2024

Ссылка:

Скачать:

Готовим ссылку...

Добавить в:

Мой плейлист

Посмотреть позже

Комментарии : 664

@codebasics 2 года назад

Check out our premium machine learning course with 2 Industry projects: codebasics.io/courses/machine-learning-for-data-science-beginners-to-advanced

@celestineokpataku 4 года назад

I have watched only 4 mins so far i had to pulse and write this comment. I will say this is one of the best tutorial i have seen in data science. Sir you need to take this to another level. What a great teacher you are

@codebasics 4 года назад

That for the feedback my friend 😊👍

@TheSignatureGuy 4 года назад

For anyone stuck with the categorical features error. from sklearn.compose import ColumnTransformer ct = ColumnTransformer([("town", OneHotEncoder(), [0])], remainder = 'passthrough') X = ct.fit_transform(X) X Then you should be able to continue the tutorial without further issue.

@muhammadhattahakimkeren 4 года назад

thanks bro

@fatimahazzahra6181 4 года назад

thanks a lot! it helps

@souvikdas3189 Год назад

Thank you brother.

@Ran_dommmm Год назад

Hey, thank for the code. I tried using your code but it gives me an error, despite of converting it (X) to an array, it gives me this error. " TypeError: A sparse matrix was passed, but dense data is required. Use X.toarray() to convert to a dense numpy array. "

@TheSignatureGuy Год назад

@@Ran_dommmm I know you said "despite converting X to an array", but just double check you have used the .toarray() method correctly. The error message seems pretty clear on this one. This function may help confirm that a dense numpy array is being passed. import numpy as np import scipy.sparse def is_dense(matrix): return isinstance(matrix, np.ndarray) Pass in X for matrix and it should return True. Good luck fixing this.

@venkatesanrf 3 года назад

Hi, Your explanation is very simple and effective Ans for practice session A)Price of Mercedes Benz -4Yr old--mileage 45000= 36991.31721061 B)Price of BMW_X5 -7Yr old--mileage 86000=11080.74313219 C) Accuracy=0.9417050937281082(94 percent)

@ANIMESH_JAIN04 4 месяца назад

Same bro

@fathoniam8997 3 месяца назад

same bro.... thx for replying so that i can check my results

@jhagaurav8292 6 лет назад

Sir pls continue your machine learning tutorials ,yours tutorials are one of the best I have seen so far .

@codebasics 5 лет назад

sure Gaurav, I just started deep learning series. check it out

@samrahafeez5001 3 года назад

@@codebasics Kindly explain the concept of dummies in deep learning as well

@codebasics 4 года назад

Exercise solution: github.com/codebasics/py/blob/master/ML/5_one_hot_encoding/Exercise/exercise_one_hot_encoding.ipynb Everyone, the error with catergorical_features is fixed. Check the new notebook on my github (link in video description). Thanks Kush Verma for giving me pull request for the fix.

@urveshdave1861 4 года назад

Thank you for the wonderful explanation sir. However I am getting an error as __init__() got an unexpected keyword argument 'catergorical_features' for the line for my code onehotencoder = OneHotEncoder(catergorical_features = [0]). Is it because of change of versions? what is the solution to this?

@bishwarupdey10 4 года назад

_init__() got an unexpected keyword argument 'categorical_features' sir I get this error when I specify categorical features

@sejalmittal1326 4 года назад

@@urveshdave1861 Have you got any answer for this? I am having the same error

@sejalmittal1326 4 года назад

@@urveshdave1861 okay .. i will do that. thanks

@tanvisingh9298 4 года назад

@@urveshdave1861 Hey I am also getting the same error. how did you resolve it?

@sreenufriendz 5 лет назад

Anyone can be a teacher , but real teacher eliminates the fear from students .. you did the same !! Excellent knowledge and skills

@codebasics 5 лет назад

Sreenivasulu, your comment means a lot to me, thanks 😊

@ankitparashar7 5 лет назад

Merc: 36991.317 BMW: 11080.743 Score: 94.17%

@codebasics 5 лет назад

Your answer is perfect Ankit. Good job, here is my answer sheet for comparison: github.com/codebasics/py/blob/master/ML/5_one_hot_encoding/Exercise/exercise_one_hot_encoding.ipynb

@vishalrai2859 4 года назад

thanks for posting the answer bro

@mutiulmuhaimin9156 4 года назад

Could we upvote this comment to the top? Been looking for this for quite some time now. This is important, and this comment matters.

@Augustus1003 4 года назад

@@codebasics I used pandas dummy variable instead of using onehotencoding, because it is too confusing.

@clashcosmos4641 4 года назад

Got the same answer using OneHotEncoder after correcting tons of errors and watching videos over and over.

@Genz111-o4r 4 года назад

I was confuse from where to start studying ml and then my friend suggested this series.... It's great :-)

@rishabhjain7572 3 года назад

any other courses or source you are following? and any development you have begun ?

@sauravmaurya6097 2 года назад

want to know how much this playlist is helpful? kindly reply.

@carti8778 2 года назад

@@sauravmaurya6097 its quite helpful if u are a beginner. Beginner in sense of {not from engineering or programming background }. U can accompany this with coursera’s andrew ng course.

@carti8778 2 года назад

@@sauravmaurya6097 if u already know calculus and python programming (intermediate level) , ML would feel easy . After doing this go to the deep learning series bcz thats what used in industries.

@programmingwithraahim 3 года назад

15:50 write your code like this: ct = ColumnTransformer( [('one_hot_encoder', OneHotEncoder(categories='auto'), [0])], remainder='passthrough' ) X = ct.fit_transform(X) X Ok so it will work fine otherwise it will give an error.

@AxelWolf26 3 года назад

what is the use of this " (categories='auto') " and " 'one_hot_encoder' "

@jollycolours 2 года назад

Thank you, you're a lifesaver! I was trying multiple ways since categorical_features has now been depreciated.

@adilmajeed8439 2 года назад

@@jollycolours correct, the categorical_features parameter is deprecated and for the same following are the steps needs to be followed; from sklearn.compose import ColumnTransformer ct = ColumnTransformer([('one_hot_encoder', OneHotEncoder(), [0])], remainder='passthrough') X = np.array(ct.fit_transform(X), dtype=float)

@noubaddi8567 3 года назад

This guy is AMAZING! I have spent 2 days trying decenes of other methods and this is the only one that worked for my data and didnøt come as an error, this guy totally saved my mental sanity, I was growing desperate as in DESPERATE! Thank you, thank you, thank you!

@codebasics 3 года назад

I am glad it was helpful to you 🙂👍

@vaishalibisht518 5 лет назад

Wonderful Video. This so far the easiest explanation I have seen for one hot encoding. I have been struggling from very long to find a proper video on this topic and my quest ended today. Thanks a lot, sir.

@tushargahtori1570 Год назад

Even in 23 your video is such a relief..kudos to your teaching.

@tech-n-data 2 года назад

Your ability to simplify things is amazing, thank you so much. You are a natural teacher.

@shrutijain1628 3 года назад

this ML tutorial is by far the best one i have seen it is so easy to learn and understand and your exersise also helps me to apply what i have learn so far thank you.

@codebasics 3 года назад

Glad it helped!

@HashimAli-tz8fw Год назад

I achieved the same result using a different method that doesn't require dropping columns or concatenating dataframes. This alternative approach can lead to cleaner and more efficient code df=pd.get_dummies(df, columns=['CarModel'],drop_first=True)

@mk9834 4 года назад

I was shocked after the first 5 minutes of the video and have never thought it would be so easy and fast! Thanks ALOT1

@codebasics 4 года назад

Miyuki... I am glad you liked it

@phil97n 22 дня назад

I'm reading a textbook that has an exercise to study this same dataset to predict survived. I just finished the exercise from the book - I can't seem to go past 81% score. Thanks for your awesome explanation

@bandhammanikanta1664 4 года назад

First of all, 1000*Thanks for sharing such content on youtube.. I got an accuracy of 94.17% on training data.

@codebasics 4 года назад

Bandham, I am glad you liked it buddy 👍

@snom3ad 5 лет назад

This was really well done! Kudos to you! It's hard to find clear and concise free tutorials nowadays. Subscribed and hope to see more awesome stuff!

@wangangcwayi9420 4 года назад

You have gift of explaining things even to the layman. Big Up to you

@codebasics 4 года назад

Thanks a ton Wangs for your kind words of appreciation.

@ymoniem1 4 года назад

you really made it very easy to understand such new concepts, Thanks a lot starting from mint 12:30 about OneHotEncoder . Some udpates in Sklearn prevent using categorical_features=[0] here is the code update as of April 2020 from sklearn.preprocessing import OneHotEncoder from sklearn.compose import ColumnTransformer columnTransformer = ColumnTransformer([('encoder', OneHotEncoder(), [0])], remainder='passthrough') X = np.array(columnTransformer.fit_transform(x), dtype = np.str) X= X[:,1:] model.fit(X,y) model.predict([[1,0,2800]]) model.predict([[0,1,3400]])

@petermungai5508 4 года назад

The code is working but give a different prediction compared to dummies

@petermungai5508 4 года назад

Plus my X is showing 5 column instead of 4

@petermungai5508 4 года назад

I was entering the 0 and 1 wrongly. I am getting the same answer thank you for the code

@rameshkrishna1956 7 месяцев назад

thanks buddy

@hiver6411 3 года назад

the god of data science......Amazing explanation sir..kudos to your patience in explanation

@codebasics 3 года назад

Glad it was helpful!

@ZehraKhuwaja65 Год назад

I must say this is the best course I've come across so far.

@abhinavb717 Год назад

I am getting 84% accuracy without encoding variable, but after encoding i am getting 94% accuracy on model. Thank you for your teaching. Doing great Job

@tanmaykapure81 2 года назад

This is the best machine learning playlist i have came across on youtube😃👍, Hats off to you sir.

@roastwithmeall 7 месяцев назад

your are the best teacher on youtube , i have never seen before

@armagaan007 5 лет назад

Wait wait... I don't see the point 😕 The first half of the video does the same thing as one hot encoding(the second half of video)but second half is more tedious and takes more steps Then why not use the pd.get_dummies instead of onehotencoding??? What's the advantage of using onehot?

@codebasics 5 лет назад

I personally like pd.get_dummies as it is convenient to use. I wanted to just show two different ways of doing same thing and there are some subtle differences between the two. Check this: stackoverflow.com/questions/36631163/pandas-get-dummies-vs-sklearns-onehotencoder-what-is-more-efficient

@armagaan007 5 лет назад

@@codebasics thank you :]... btw you make grt videos

@gokkulkumarvd9125 3 года назад

How can I like this video more than 100 times!

@codebasics 3 года назад

I am happy this was helpful to you.

@omharne1386 Год назад

I will say this is one of the best tutorial i have seen in ML

@vishwa4908 4 года назад

Awesome, you're explaining concepts in very simple manner.

@codebasics 4 года назад

Vishwa I am happy to help 👍

@prasadjoshi8213 4 года назад

Hi sir !! Most easier way u teach ML. Thanks a lot!!!. I m going through ur videos and assignments. I got the answer for merce: 36991.31, BMW:11080.74 & model score :0.9417. The Model score is 94.17%. My QUE is how to improve the Model score ??? Is there any way to apply the features?

@datasciencewithshreyas1806 3 года назад

One of the best explanation for Encoding 👌👍

@codebasics 3 года назад

Glad it was helpful!

@cahitskttaramal3152 4 года назад

Thank you for wery well explained tutorial. I have one question though, you are training all of your data here and yet model score is only 0.95. Why is that? It must be 1. If you were to split your data and train it would make sense but your case doesn't. What am I missing here?

@codebasics 4 года назад

Alper, It is not true that if you use all your training data the score is always one. Ultimately for regression problem like this you are trying to make a guess of a best fit line using gradient descent. This is still an *approximation* technique hence it will never be perfect. I am not saying you can never get a score of 1 but score less then 1 is normal and accepted.

@ZOSELY Год назад

I wish I could give this videos 2 thumbs up! Great explanation of all the steps in one-hot encoding! Thank you!!

@himanshusingh-vt9do 5 месяцев назад

my model score 94% Accuracy .Thankyou sir for amazing video.

@weshallneversurrender 2 года назад

The Data Science GOAT! One day I will send you a nice donation for all that you have contributed to my journey sir!

@deekshithkumar3234 3 года назад

superb and precisely explained

@codebasics 3 года назад

Thank you 🙂

@leooel4650 5 лет назад

Mercedes = array([[36991.31721061]]) BMW = array([[11450.86522658]]) Accuracy = 0.9417050937281082 Thanks for your time and knowledge once again!

@srinivasreddy1709 4 года назад

Hi Dhaval, your explanation on all the topics is crystal clear. Can you please make videos on NLP also

@farjadmir8842 4 года назад

I also got them correct. Sir, this course is amazing. You have made it so easy to understand.

@codebasics 4 года назад

Glad to hear that

@timse699 3 года назад

You teach with passion! thank you for the series!

@hamzazidan6093 2 месяца назад

Iam here from 2024 after 6 years and I want to say that this playlist is wonderful! I hope that you update it because there're many changes in the syntax of sklearn now

@codebasics 2 месяца назад

Hey next week I am launching an ML course on codebasics.io which will address this issue. It has the latest API, in depth math and end to end projects.

@maruthiprasad8184 2 года назад

For Mercedec benz I got 51981.26, for BMW i got 39728.19 & score is 94.17% . Thank you very much to make ML easy.

@late_nights 4 года назад

If anyone got struck at One hot encoder at 16:26 then type this command and execute pip install -U scikit-learn==0.20

@dhananjaypatel3538 4 года назад

Thanks 😃

@kketanbhaalerao 4 года назад

stuck and still not executed using your solution

@richard_shaju Год назад

You are a Gem

@bharathdwarakanath1587 4 года назад

The label encoding done for the independent variable column, 'town' in the second half of the video, I think, isn't needed. Instead just doing One Hot Encoding is enough. Wonderful contribution anyway. Thanks!!

@loycewaihiga6707 3 года назад

I agree

@komalsunandenishrivastava9211 Месяц назад

That image on one hot encoding 🤣🔥

@sanjanatarekar5942 2 года назад

Hi, Since OneHotEncoder's categorical_features has been deprecated... Can you please mention here how to proceed?

@shekharbabar2496 4 года назад

the best video series on ML sir ....Thank you very much sir....

@flamboyantperson5936 6 лет назад

Please make regression video using preprocessing library with standaridization and normalization variables

@nationhlohlomi9333 Год назад

A PLACE TO RUN TO WHEN ONE IS STUCK, THANK UOU SO MUCH SIR

@elinem5311 4 года назад

thank you, this helped me so much with multivariate regression with many categorical features!

@NoureddineBahi 3 года назад

Think you very much...wonderful work..special think from Morocco in north of Africa

@AruLcomments 4 года назад

You are doing a wonderful job, people like you inspire me to learn and share the knowledge i gain. It is very useful for me. All the best.

@debaratighatak2211 3 года назад

I learned a lot from the exercise that you gave at the end of the video, thank you so much sir!

@thanusan 5 лет назад

Excellent video - thank you!

@dineshgaddi1843 3 года назад

First of all thank you for making life easier for people (who want to learn Machine Learning). You explain really well. Big Fan. When I was trying to execute categorical_features=[0], it gave an error. It seems this feature has been depreciated in the latest version of scikit learn. Instead they are recommending to use ColumnTransformer. I was able to get the same accuracy 0.9417050937281082. Another thing i wanted to know, when you had initially used label encoder and converted categorical values to numbers, why we specified the first column as categorical, when it was already integer value ?

@geekyprogrammer4831 3 года назад

This is really the best series to get started with ML

@shinosukenohara.123 3 года назад

How are u starting?

@codebasics 3 года назад

Glad it was helpful!

@geekyprogrammer4831 3 года назад

@@shinosukenohara.123 I am watching this channel, Krish Naik and Andrew NG course on Coursera

@piyushjha8888 4 года назад

model.predict([[45000,4,0,0]])=array([[36991.31721061]]), model.predict([[86000,7,0,1]])=array([[11080.74313219]]), model.score(X,Y)=0.9417050937281082. Thanks sir for these exercise

@mallikasrivastava 3 года назад

Your videos are awesome

@codebasics 3 года назад

Glad you like them!

@manasaraju8552 Год назад

difficult topics are easily understood, Thank you so much for the content sir

@ayushmanjena5362 2 года назад

15:50 write this code from sklearn.preprocessing import OneHotEncoder from sklearn.compose import ColumnTransformer ct = ColumnTransformer([('town', OneHotEncoder(), [0])], remainder = 'passthrough') x = ct.fit_transform(x) x

@rooshanghous6912 10 месяцев назад

This is an amazing tutorial! saved me so much time and brought so much clarity!!! Thank you!

@preetipisupati2308 4 года назад

Thanks for the excellent video.. but due to the recent enhancements, ColumnTransformer from sklearn.compose is to be used for OneHotEncoding.

@codebasics 4 года назад

Preeti, can you give me a pull request.

@Dim-zt5ei Год назад

Great videos! Unfortunately it becomes harder and harder to code in the same time as the video because there are more and more changes in the libraries you use. For example sklearn library removed categorical_features parameter for onehotencoder class. It was also the case for other videos from the playlist. Would be great to have the same playlist in 2022 :)

@codebasics Год назад

Point noted. I will redo this playlist when I get some free time from tons of priorities that are in my plate at the moment

@Dim-zt5ei Год назад

@@codebasics Thank you for the reply and again : Great job for all the quality tutorials!

@betzthomas9693 4 года назад

Sir ,what is the best method to do label encoding for job designations like (management ,blue-collar,technician etc) .Please let me know the best practice.

@Adnan25048 5 лет назад

That's a great tutorial of one-hot encoding. I was unable to find a complete example anywhere. Thanks for sharing.

@codebasics 5 лет назад

Thanks Adnan for your valuable feedback

@jayshreedonga2833 Год назад

thanks sir nice lecture sir you are really a great teacher you teach everything so nicely even tough thing becomes easy when you teach thanks a lot

@kenzhao6236 3 года назад

if your input [26] code can't compile successfully, try: from sklearn.preprocessing import OneHotEncoder from sklearn.compose import ColumnTransformer ct = ColumnTransformer( [('one_hot_encoder', OneHotEncoder(categories='auto'), [0])], remainder='passthrough') x = ct.fit_transform(x) x

@claude-olivierbatungwanayo9059 6 лет назад

Excellent as usual!

@uvinodh90 5 лет назад

Thanks for the excellent tutorial.... I see there is a decrease in score between this and the exercise data. Maybe due to an extra column in exercise data ? With increase in columns on X, Will the linearRegression score decrease ?

@scriptfox614 4 года назад

The import linear regression statement lol. Amazing tutorial. :D

@asamadawais 2 года назад

Simply excellent explanation with very simple examples!

@ramanandr7562 Год назад

Thank you sir🎉. You made my ML Journey Better.. 🤩

@mapa5000 Год назад

You make it easy with your explanation !! Thank you !!

@sarafatima2252 3 года назад

definitely one of the best videos to learn from!

@MrArunlama 9 месяцев назад

I was learning through a paid course, and then I had to come here to understand this concept of dummy variable.

@subrahmanyamkesani7304 2 года назад

Can you please explain the difference between "get_dummies" and "OneHotEncoding" ?

@tanmayck9887 2 года назад

Why did we apply LabelEncoder & then OneHotEncoding in 2nd method as we can directly apply OHE itself to thre data?

@mayanktripathi4u 5 лет назад

Hi Sir, how to select/ choose the correct model for prediction. Is there a way? Please create a tutorial on this, and if already have please share the link.

@codebasics 5 лет назад

Mayank, selecting appropriate model for given problem is an art as well as science. I will probably create a separate tutorial on this but to give you an idea, what people do is: do exploratory data analysis and visualization to first find out the nature of the dataset. Based on these visualization and primary data analysis you might get an idea on what set of models might be worth using. Then you try multiple models to find out their performance (or score).one technique to use is k fold cross validation. That will evaluate performance of various models for your dataset. K fold will help you identify best model for a given dataset. Again there is no fixed technique to find final answer, it is something like you use popular approaches and some trial error to find which model will work best for you. Hope this helps!

@sujithramanathan3275 4 года назад

@@codebasics Thanks for your time. It would be great, If we get tutorial from you for " To find a best model for given dataset. "

@rachitbhatt40000 3 года назад

This module makes my code hot!

@regithabaiju 4 года назад

Your tutorial video is helping so much for knowing more about ML.

@codebasics 4 года назад

I am happy this was helpful to you.

@yahiagamal2876 2 года назад

Hi sir, thank you for trying to simplify ML. But honestly, this lecture has many unclarified steps, like: why did you convert x to a 2-dimensional array? after encoding on the ohe dataframe, you made many amendments with very very fast clarifications! last thing, while executing the below command, it always gives an error, although I check everything is ok: from sklearn.preprocessing import OneHotEncoder ohe = OneHotEncoder(categorical_features=[0]) Thank you again and looking forward hearing from you BR YG

@mgk6463 3 года назад

Hello people, for your information sci-kit learn model takes care of dummy variable trap, no need to eliminate a column.

@felixgallo5132 3 года назад

They're basically the same however pd.dummy variables are easier to use. Thank u, sir.

@codebasics 3 года назад

yes I agree

@swagataroy7230 4 года назад

Spot-on

@SrinivasA-vk7if 3 месяца назад

Excellent video.., thank you so much.

@rooky1379 6 лет назад

I get a warning that the argument categorical_features=[0] is soon to be redundant and everything will be handled by the ColumnTransformer from sklearn.compose

@urveshdave1861 4 года назад

Hey buddy, did you get a solution for the __init__() got an unexpected keyword argument 'catergorical_features'

@petermungai5508 4 года назад

I am still having these error

@subrataassam 4 года назад

Yes this categorical _features equal to column name is no longer working. Below code should work from sklearn.compose import make_column_transformer encoded=make_column_transformer((OneHotEncoder(sparse=False),['town']),remainder='passthrough'); encoded.fit_transform(X) ## It will also remove the __init__() type of errors

@pranavakailash8751 3 года назад

This helped me a lot in my assignment, thank you so much code basics

@codebasics 3 года назад

Glad it helped!

@carlavirhuez4785 6 лет назад

Thank you! Very helpful!

@brijesh0808 4 года назад

@13:20 we need to do : dfle = df.copy() ? because otherwise changes in dfle will reflect back to df Thanks :)

@adarshdubey1784 2 года назад

Yes u r right

@swaruppanda2842 5 лет назад

nicely explained👌

@MahmouudTolba 2 года назад

final =pd.get_dummies(df ,columns=['town' ], drop_first=True ) short way for concating and dropping from data school

@leelavathigarigipati3887 4 года назад

Thank you so much for the detailed step by step explanation.

@codebasics 4 года назад

Glad it was helpful!

@6223086 5 лет назад

thank you so much it has helped me in my work

@codebasics 4 года назад

Hey Eugene, I am glad to hear that it helped you in your work. Stay in touch for more videos and share our channel if you really find it worth.

@annette4718 4 года назад

This was ridiculously helpful. Thank you so much!!

@codebasics 4 года назад

Netté, I am glad you liked it

@RA-pi1lg 5 лет назад

Thank you for great videos

@gisantarem 3 года назад

Tks a lot for the tutorial, but I have a doubt: why didn´t u split the dataset in train and test? It seems u used your entire dataset to train the model, didn´t u?