Precision: Of the users who churned, how many did the model predict would churn Precision: छोड़ कर गए लोगों में से कितनों के बारे में model ने बताया था कि ये छोड़ के जा सकते हैं | Recall: Of the users the model said would churn, how many users actually churned? Recall: Model ने जिनको बताया था कि ये जा सकते हैं , उनमे से कितने वाकई में छोड़ कर चले गए |
I think you got the precision and recall swapped, precision = Tp/ tp + fp , Recall = Tp/ Tp + Fn , The false negatives + true positives in recall stands for the users who actually churned and the true positives + false positives in precisions are the predictions by the model!
@@CodeEmporium I loved this presentation - you're so good at touching the main points (e.g. data leakage) and all in 13 mins!! Exactly what I was looking for, well done!
Wait why can we only train on data up to 3 months in the past? If you have the same features for customers from last year and also see their usage history last year, couldn't you use that historical data to predict churn as well?
Awesome video! The step at 9:05 went a bit too fast in my opinion. Could you show us, perhaps with made up data, how you verify it using SQL? What goes into the process of defining a "Correct" and an "incorrect"?
Performing uni-variate analysis of predictor variable's relationship with target variable should answer the "hunches"/hypothesis. For instance, plotting a bar chart of Gender vs Churn will tell us whether males are likely or females are more likely to churn. In this case, we can visualize #WorkOrdersIn6Months against Churn
Want to ask about perhpas too bias features. Isn't something like "days from last orders" or "total order count" would be too bias to predict churn. I mean it's common sense that if the customer hasn't ordered for a while or in a long time ordered once or twice, he is SUPER likely to churn when the churn period arrives. Wouldn't such features somehow hide or overshadow other features which perhaps could indicate us which features make negative churn predicitons ?
True. Like if the feature is 89 days since last purchase, chances are they will churn. And now that i think about it, it probably won't be useful making that prediction since acting on it is impossible. We could just exclude that as a feature and say "What is the probability the customer churns 3 months from making this prediction". Good eye
Start porting your videos to LBRY's Odysee as well. You can set that to automatically upload it there whenever it uploads to RU-vid and can be activated when you create your Odysee account. You deserve way more exposure, maybe that can help a little since there is far less competition there.
There are two points that you mentioned in the model training slides 1. Create snapshots randomly for active users at a given time 2. Only train on data up to 3 months in the past Does that mean that when I want to create a random snapshot, let's say of 18th Mar 2021, I only include customers who have purchased at least once in the last 3 months as anyone whose last purchase was 4 months prior to 18th Mar is already a churned and hence inactive violating the first point. Also of those users who are considered active, can we look back at data prior to 3 months or not. I am guessing not because that would violate rule 2 but then you are using a feature #work orders in last 6 months. So I am a little confused here.
Great video! I don’t think you could use regression though because of right censoring. You don’t know if some of those customers will churn and when. Survival analysis models deal with this kind of censoring.
how should we treat the test dataset? Does that also have to be randomly sampled? How would you present recall and precision? at the user level or daily observation level? Thanks