Predict Employee Attrition Using Machine Learning & Python

Подписаться 114 тыс.

Просмотров 51 тыс.

50% 1

Use Python & Machine Learning to predict employee attrition
►Predict Employee Attrition Article:
/ predict-employee-attri...
⭐Please Subscribe !⭐
⭐Support the channel and/or get the code by becoming a supporter on Patreon:
/ computerscience
⭐Websites:
► everythingcomputerscience.com/
⭐Helpful Programming Books
► Python (Hands-Machine-Learning-Scikit-Learn-TensorFlow):
amzn.to/2AD1axD
► Learning Python:
amzn.to/3dQGrEB
►Head First Python:
amzn.to/3fUxDiO
► C-Programming :
amzn.to/2X0N6Wa
► Head First Java:
amzn.to/2LxMlhT
#MachineLearning #Python #ArtificialIntelligence #AI

Наука

Опубликовано:

30 июл 2024

Ссылка:

Скачать:

Готовим ссылку...

Добавить в:

Мой плейлист

Посмотреть позже

Комментарии : 76

@kazimrazatalpur7228 4 года назад

Amazing very informative, can't wait to see your upcoming tutorials.

@BillAugersdca 4 года назад

I enjoyed this and found it instructive to follow along. Appreciate your quick pacing, yet somehow unhurried, teaching style.

@paulushimawan5196 3 года назад

Yes that's the reason I like this video. Nice teaching style although less depth. But much better than those courses out there that just give the notebook and we have to run by ourselves without explaining one by one.

@rohittiwari1610 4 года назад

Simple and easy code. Nice explanation. Thank you so much

@NitinBhavvsar 4 года назад

You are a pro boss !! Good to see your video. Query - How do you validate the prediction results ? What are the ways and types to validate the same? Your thoughts on classification reports for the same ?

@SaiCharan-zi1zu 3 года назад

Hii this video is at it's best. But I need a conclusion like on which the attrition is more dependent and how are we going to find out the main factor that's affecting the attrition the most?

@onyedikachiadigwe8995 4 года назад

can you show how we will predict which staff will leave from the database

@allammihay 4 года назад

Hallo, I have follow your step in medium but in the last step when I want to show importance feature there is eror " Valuue eror = Array must all be same length". I don"t understand with this problem, could you help me?

@AnkitBhargava 2 года назад

Thank you for the walkthrough - really helpful. Question: early on in the analysis, you plotted bar graph for Age with Attrition as the hue. But we dont know if Age is correlated with other attribute or attributes so what would be the point of the graph? Age alone does not explain the attrition rate. Why look at that at all?

@RahulRautela5797 2 года назад

Can we also find the specific reason of leaving, the variable with the highest value?

@sukruthms7984 3 года назад

Thank you for the appropriate explanation

@ComputerSciencecompsci112358 3 года назад

Glad you enjoyed the video!

@michaelmullings 2 года назад

Question - How do i predict in a current employee with attrit? how do I now test which employees are now on their way out the door and what factors do i look for that show this

@SantoshMaurya-is4bp Год назад

It's very nice video,it's really helpful me

@2lauren54 10 месяцев назад

At df.corr() , why do I get..? (FutureWarning: The default value of numeric_only in DataFrame.corr is deprecated. In a future version, it will default to False. Select only valid columns or specify the value of numeric_only to silence this warning. df.corr() ). and an error on every code after that

@shyamkishore6232 7 месяцев назад

How to make conclusions out of the entire coding process for presentation? Like what are the factors under the columns affect the most Attriction

@jeevarajahjeevaratnam6224 4 года назад

I can't run seaborn, keep getting modulenotfounderror: no module named 'resource' . I'm using windows 10.

@jimalyajenkins9133 2 года назад

Solid tutorial. How do I use this though?

@mehtabrosul6909 Год назад

in last forest.fit(x_train,y_train) in shows string cannot convert into float why so???

@shashankbafna2867 4 года назад

Fantastic approach. Can you also make a video explaining how can we use this model? like what after creating this prediction?

@erickwang5850 4 года назад

I also want to know, like can we the significance of each feature, and how to do that

@SANJIVRAI6693 3 года назад

@@erickwang5850 yes you can check the important features by their score of impact

@RohanTayal 3 года назад

Thank you for the amazing explanation but i have a query, why did you use label encoder and not one hot encoder to convert non - numeric data into numeric data?

@SANJIVRAI6693 3 года назад

you can use either of it

@prasunprakash2297 4 года назад

how to calculate employee performance-department wise?

@yashwantkumarverma1480 2 года назад

can we fix range of x axis ?? cuz I do have many data points on x axis

@nbddesigns7620 2 года назад

Getting error at randomforestclassifier using sklearn ? How to solve this

@abdulalimbaig3286 3 года назад

where is the link to the data set?

@ajayantony4144 3 года назад

Instead of dropping the Age column can’t we change the index to one for attrition? Just asking, cause I am new to Data Science and curious.

@SANJIVRAI6693 3 года назад

yes you can

@debarati27 2 года назад

how do we show the decision tree?

@nikolaynikolov3707 4 года назад

This is a good straight forward model training for beginners. But the model is weak. Especially in the case of the problem, if you want to make employee attrition you want to know who will quit the job and maybe contact him and the opposite way. Maybe it will be better to choose another metric like Recall or F1 score.

@ComputerSciencecompsci112358 4 года назад

You can never have enough metrics.

@cloudbaud7794 3 года назад

this has a Recall of 15%...howz that any good?? and in this case, the cost of an employee leaving unpredicted can never be same as falsely predicting someone who ends up staying. So is F1 that much more value-added???

@SANJIVRAI6693 3 года назад

@@cloudbaud7794 the goal will be to reduce the False Negative as much as possible, so the better Recall the good

@soumyasrm 4 года назад

Can you please share GitHub link of this project

@chowadagod 4 года назад

Lovely video but please do projects which involves data cleaning , especially handling text data such .what you do is lovely and very much appreciate sir but it's bit too plain and majority of work in data science is DATA CLEANING .so please in upcoming videos focus on this aspect .thank you sir

@mrgz999 3 года назад

I agree, tutorials on (i) Data Cleaning (ii) Merging of two files for two different years to do combined analysis

@sherifelgazar4089 2 года назад

Friend, can you put the dataset, to apply

@surender6320 3 года назад

Can you please share the code, if you don't mind

@rahulahuja1412 4 года назад

Informative. Thanks. But would've been better had you standardized the data and then given an analysis of the data.

@ComputerSciencecompsci112358 4 года назад

Thanks for your opinion!

@cloudbaud7794 3 года назад

standardized in what way?

@SANJIVRAI6693 3 года назад

@@cloudbaud7794 standard scalar meaning the data set normalized in certain range for all values - mostly from -1 to 1 --lowest values to -1 and highest to 1

@vijaysolanki7497 3 года назад

why you won't use oversampling in this unbalanced data (yes-237,no-1233)?

@grahamg4529 Год назад

Yes would really help with the FN and recall score

@nbddesigns7620 2 года назад

When we are fit the x_train and Y_train getting value error : Input contains Nan

@fabfitmom 2 года назад

Your columns have null values. clean up the data to make sure all rows have data in al columns. His step where he tests this is : #Get a count of empty values for each col df.isna().sum() the above should give you 0 value for all fields and the below should give you a False for theX_train & Y_train to work : # check for any missing or null values df.isnull().values.any()

@robiparvez 5 месяцев назад

where can I find the dataset??

@cloudbaud7794 3 года назад

can someone please explain how we get 80% accuracy just by guessing "No" all the time need to understand the math (1233-237)/1233

@SANJIVRAI6693 3 года назад

if you say that attrition is NO to all the values you will be correct 80% of the time is what he means

@AnkitBhargava 2 года назад

I think he is just trying to pint out that there are too many NOs (not left the company) compared to Yes. So many that even without any modeling or scaling if you simply guess (like a coin toss) that the employee has NOT left, you would be right 80% of the time

@ItAintNecessarilySo 2 года назад

It should really be (# did not leave) / total employees = 1233 / (1233 + 237) which is approx 84%. This is the inverse or reciprocal of what the creator originally wrote.

@manideep4486 4 года назад

With this model, how can I check which employee is more likely to attrite?

@idowukila5992 4 года назад

Great question. I was wondering, too. Have you by any chance gotten an answer to this?

@nikolaynikolov3707 4 года назад

@@idowukila5992 Well this should be the Recall, but in this tutorial it was very weak

@SANJIVRAI6693 3 года назад

Any employee who will be predicted as Yes by the model will be most likely to leave - since its a Binary Classification you only get Yes or No result

@sonyishutin9949 Год назад

@@SANJIVRAI6693 how to see the employee who predicted to leave? I'm still learning

@alexanderthegreat9631 4 года назад

I keep getting a value error: ValueError Traceback (most recent call last) in () 1 from sklearn.model_selection import train_test_split ----> 2 X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size = None, random_state = 0) /usr/local/lib/python3.6/dist-packages/sklearn/utils/validation.py in check_consistent_length(*arrays) 210 if len(uniques) > 1: 211 raise ValueError("Found input variables with inconsistent numbers of" --> 212 " samples: %r" % [int(l) for l in lengths]) 213 214 ValueError: Found input variables with inconsistent numbers of samples: [1, 1470] Can someone help?

@SANJIVRAI6693 3 года назад

test_size needs to be defined - how much split will you give for train/test from whole dataset

@mrgz999 3 года назад

@@SANJIVRAI6693 why we selected 75 and 25% percent split. Why not more?

@pavel822 4 года назад

where can I get this data?

@q_1 4 года назад

Kaggle.com, IBM HR Analytics Employee Attrition

@namanagrawal4968 4 года назад

where is the dataset used?

@DaisyBhullar27 3 года назад

Kaggle

@codewiththink303 2 года назад

please give me hrm dataset

@being_aspirang 3 года назад

this data sets is imbalanced, so we should use different approach to do project...

@HumptyDumptyActual 4 года назад

Your model by random guessing gives you 80% accuracy. But by machine learning it gives 86% accuracy. This generally makes a case against ML since -+6% accurate results are not that far off from random guess. So it is better to go with guessing than ML. Now that's my opinion. Others are welcome to share theirs as well.

@nikolaynikolov3707 4 года назад

The building of this model was very straight forward. Of course, if you make it for some project, you will make some Feature engineering steps before start with the training. That the model is weak you can see on the TP. They were only 9 to FP of 45. The Recall is very bad, which means, the whole model is not usable. But with some Feature engineering and maybe a better algorithm, you will receive great results!

@furkanozbudak4440 4 года назад

Guessing gives 80% accuracy only on this particular dataset. New gathered data can have %80 attritions = "Yes", which will decrease your guess's accuracy to 20%. Then your guess would be way worse than flipping a coin and predicting based on the tail or head.

@grahamg4529 Год назад

@@furkanozbudak4440 Exactly I fell into the trap of relying on accuracy when working with an unbalanced dataset. It can be very misleading for a beginner, but I’ve learnt precision and recall are actually more important in identifying the target data

@ainli4125466 2 года назад

Thank you, and i got an error "ValueError: Input contains NaN, infinity or a value too large for dtype('float32')," when running the scripts of # use the random forest classifier from sklearn.ensemble import RandomForestClassifier forest = RandomForestClassifier(n_estimators=10, criterion='entropy',random_state=0) forest.fit(x_train, y_train) could you shed me some lights how to fix it?

@grahamg4529 Год назад

You need to remove NaN’s from from dataset during the data cleansing process

@harikanttiwari5326 2 года назад

i am getting error after #use random forest classifier from sklearn.ensemble import RandomForestClassifier forest=RandomForestClassifier(n_estimators = 10 , criterion = 'entropy', random_state = 0) forest.fit(X_train, Y_train) and the error is could not convert string to float: 'Non-Travel'

@dannymuzata4633 2 года назад

Before you come to random forest classifier , you must ensure that you have converted all your categorical data to numeric data. You wont have that error.