I have watched only 4 mins so far i had to pulse and write this comment. I will say this is one of the best tutorial i have seen in data science. Sir you need to take this to another level. What a great teacher you are
For anyone stuck with the categorical features error. from sklearn.compose import ColumnTransformer ct = ColumnTransformer([("town", OneHotEncoder(), [0])], remainder = 'passthrough') X = ct.fit_transform(X) X Then you should be able to continue the tutorial without further issue.
Hey, thank for the code. I tried using your code but it gives me an error, despite of converting it (X) to an array, it gives me this error. " TypeError: A sparse matrix was passed, but dense data is required. Use X.toarray() to convert to a dense numpy array. "
@@Ran_dommmm I know you said "despite converting X to an array", but just double check you have used the .toarray() method correctly. The error message seems pretty clear on this one. This function may help confirm that a dense numpy array is being passed. import numpy as np import scipy.sparse def is_dense(matrix): return isinstance(matrix, np.ndarray) Pass in X for matrix and it should return True. Good luck fixing this.
Hi, Your explanation is very simple and effective Ans for practice session A)Price of Mercedes Benz -4Yr old--mileage 45000= 36991.31721061 B)Price of BMW_X5 -7Yr old--mileage 86000=11080.74313219 C) Accuracy=0.9417050937281082(94 percent)
Exercise solution: github.com/codebasics/py/blob/master/ML/5_one_hot_encoding/Exercise/exercise_one_hot_encoding.ipynb Everyone, the error with catergorical_features is fixed. Check the new notebook on my github (link in video description). Thanks Kush Verma for giving me pull request for the fix.
Thank you for the wonderful explanation sir. However I am getting an error as __init__() got an unexpected keyword argument 'catergorical_features' for the line for my code onehotencoder = OneHotEncoder(catergorical_features = [0]). Is it because of change of versions? what is the solution to this?
Your answer is perfect Ankit. Good job, here is my answer sheet for comparison: github.com/codebasics/py/blob/master/ML/5_one_hot_encoding/Exercise/exercise_one_hot_encoding.ipynb
@@sauravmaurya6097 its quite helpful if u are a beginner. Beginner in sense of {not from engineering or programming background }. U can accompany this with coursera’s andrew ng course.
@@sauravmaurya6097 if u already know calculus and python programming (intermediate level) , ML would feel easy . After doing this go to the deep learning series bcz thats what used in industries.
15:50 write your code like this: ct = ColumnTransformer( [('one_hot_encoder', OneHotEncoder(categories='auto'), [0])], remainder='passthrough' ) X = ct.fit_transform(X) X Ok so it will work fine otherwise it will give an error.
@@jollycolours correct, the categorical_features parameter is deprecated and for the same following are the steps needs to be followed; from sklearn.compose import ColumnTransformer ct = ColumnTransformer([('one_hot_encoder', OneHotEncoder(), [0])], remainder='passthrough') X = np.array(ct.fit_transform(X), dtype=float)
This guy is AMAZING! I have spent 2 days trying decenes of other methods and this is the only one that worked for my data and didnøt come as an error, this guy totally saved my mental sanity, I was growing desperate as in DESPERATE! Thank you, thank you, thank you!
Wonderful Video. This so far the easiest explanation I have seen for one hot encoding. I have been struggling from very long to find a proper video on this topic and my quest ended today. Thanks a lot, sir.
this ML tutorial is by far the best one i have seen it is so easy to learn and understand and your exersise also helps me to apply what i have learn so far thank you.
I achieved the same result using a different method that doesn't require dropping columns or concatenating dataframes. This alternative approach can lead to cleaner and more efficient code df=pd.get_dummies(df, columns=['CarModel'],drop_first=True)
I'm reading a textbook that has an exercise to study this same dataset to predict survived. I just finished the exercise from the book - I can't seem to go past 81% score. Thanks for your awesome explanation
you really made it very easy to understand such new concepts, Thanks a lot starting from mint 12:30 about OneHotEncoder . Some udpates in Sklearn prevent using categorical_features=[0] here is the code update as of April 2020 from sklearn.preprocessing import OneHotEncoder from sklearn.compose import ColumnTransformer columnTransformer = ColumnTransformer([('encoder', OneHotEncoder(), [0])], remainder='passthrough') X = np.array(columnTransformer.fit_transform(x), dtype = np.str) X= X[:,1:] model.fit(X,y) model.predict([[1,0,2800]]) model.predict([[0,1,3400]])
I am getting 84% accuracy without encoding variable, but after encoding i am getting 94% accuracy on model. Thank you for your teaching. Doing great Job
Wait wait... I don't see the point 😕 The first half of the video does the same thing as one hot encoding(the second half of video)but second half is more tedious and takes more steps Then why not use the pd.get_dummies instead of onehotencoding??? What's the advantage of using onehot?
I personally like pd.get_dummies as it is convenient to use. I wanted to just show two different ways of doing same thing and there are some subtle differences between the two. Check this: stackoverflow.com/questions/36631163/pandas-get-dummies-vs-sklearns-onehotencoder-what-is-more-efficient
Hi sir !! Most easier way u teach ML. Thanks a lot!!!. I m going through ur videos and assignments. I got the answer for merce: 36991.31, BMW:11080.74 & model score :0.9417. The Model score is 94.17%. My QUE is how to improve the Model score ??? Is there any way to apply the features?
Thank you for wery well explained tutorial. I have one question though, you are training all of your data here and yet model score is only 0.95. Why is that? It must be 1. If you were to split your data and train it would make sense but your case doesn't. What am I missing here?
Alper, It is not true that if you use all your training data the score is always one. Ultimately for regression problem like this you are trying to make a guess of a best fit line using gradient descent. This is still an *approximation* technique hence it will never be perfect. I am not saying you can never get a score of 1 but score less then 1 is normal and accepted.
Iam here from 2024 after 6 years and I want to say that this playlist is wonderful! I hope that you update it because there're many changes in the syntax of sklearn now
Hey next week I am launching an ML course on codebasics.io which will address this issue. It has the latest API, in depth math and end to end projects.
The label encoding done for the independent variable column, 'town' in the second half of the video, I think, isn't needed. Instead just doing One Hot Encoding is enough. Wonderful contribution anyway. Thanks!!
First of all thank you for making life easier for people (who want to learn Machine Learning). You explain really well. Big Fan. When I was trying to execute categorical_features=[0], it gave an error. It seems this feature has been depreciated in the latest version of scikit learn. Instead they are recommending to use ColumnTransformer. I was able to get the same accuracy 0.9417050937281082. Another thing i wanted to know, when you had initially used label encoder and converted categorical values to numbers, why we specified the first column as categorical, when it was already integer value ?
model.predict([[45000,4,0,0]])=array([[36991.31721061]]), model.predict([[86000,7,0,1]])=array([[11080.74313219]]), model.score(X,Y)=0.9417050937281082. Thanks sir for these exercise
15:50 write this code from sklearn.preprocessing import OneHotEncoder from sklearn.compose import ColumnTransformer ct = ColumnTransformer([('town', OneHotEncoder(), [0])], remainder = 'passthrough') x = ct.fit_transform(x) x
Great videos! Unfortunately it becomes harder and harder to code in the same time as the video because there are more and more changes in the libraries you use. For example sklearn library removed categorical_features parameter for onehotencoder class. It was also the case for other videos from the playlist. Would be great to have the same playlist in 2022 :)
Sir ,what is the best method to do label encoding for job designations like (management ,blue-collar,technician etc) .Please let me know the best practice.
if your input [26] code can't compile successfully, try: from sklearn.preprocessing import OneHotEncoder from sklearn.compose import ColumnTransformer ct = ColumnTransformer( [('one_hot_encoder', OneHotEncoder(categories='auto'), [0])], remainder='passthrough') x = ct.fit_transform(x) x
Thanks for the excellent tutorial.... I see there is a decrease in score between this and the exercise data. Maybe due to an extra column in exercise data ? With increase in columns on X, Will the linearRegression score decrease ?
Hi Sir, how to select/ choose the correct model for prediction. Is there a way? Please create a tutorial on this, and if already have please share the link.
Mayank, selecting appropriate model for given problem is an art as well as science. I will probably create a separate tutorial on this but to give you an idea, what people do is: do exploratory data analysis and visualization to first find out the nature of the dataset. Based on these visualization and primary data analysis you might get an idea on what set of models might be worth using. Then you try multiple models to find out their performance (or score).one technique to use is k fold cross validation. That will evaluate performance of various models for your dataset. K fold will help you identify best model for a given dataset. Again there is no fixed technique to find final answer, it is something like you use popular approaches and some trial error to find which model will work best for you. Hope this helps!
Hi sir, thank you for trying to simplify ML. But honestly, this lecture has many unclarified steps, like: why did you convert x to a 2-dimensional array? after encoding on the ohe dataframe, you made many amendments with very very fast clarifications! last thing, while executing the below command, it always gives an error, although I check everything is ok: from sklearn.preprocessing import OneHotEncoder ohe = OneHotEncoder(categorical_features=[0]) Thank you again and looking forward hearing from you BR YG
I get a warning that the argument categorical_features=[0] is soon to be redundant and everything will be handled by the ColumnTransformer from sklearn.compose
Yes this categorical _features equal to column name is no longer working. Below code should work from sklearn.compose import make_column_transformer encoded=make_column_transformer((OneHotEncoder(sparse=False),['town']),remainder='passthrough'); encoded.fit_transform(X) ## It will also remove the __init__() type of errors
Tks a lot for the tutorial, but I have a doubt: why didn´t u split the dataset in train and test? It seems u used your entire dataset to train the model, didn´t u?