43:53 , for the monetary value, can someone explain why we use price alone, instead of revenue (price * quantity) for each customer? Am i missing something
Hi, I think I’m a bit confused about data splitting and CV process. When you build the model, you’re using “unseen” data as input, and try to predict again on seen and unseen data. From the business perspective and algorithmic way, isn’t it supposed be tested on the only targets df?
You are absolutely true, what you should do is while you are estimating all features, discard last 3m. Set last 3month as test timeline. After having train test split check your results. (test: last 3 month, train: except 3 month data) Lecture is perfect, thank you!
Hi! I think there is an issue with the logic. Correct me if Im wrong. You are splitting the data in most recent and oldest, and then you are training a model using the same old and new data. Your y in the model is what you are trying to predict later (90 most recent days). I dont see the point of creating the model here.
Hi Chen, we have one in Learning Labs PRO membership. It's Learning Lab 58. Upon enrollment you'll gain access to all of our labs: university.business-science.io/p/learning-labs-pro
Hi Matt, I'm planning to add this to my portfolio. Is it possible to follow along with jupyter notebook + pip? or having the same setup as you is crucially important?
You can but support is only provided to Learning Labs Pro members. You can join LL PRO here: university.business-science.io/p/learning-labs-pro?el=youtube
Thanks for the tutorial but I think there is a problem with the logic. You trained the model using seen target values. Then you tried to predict that target values. You should have predicted the following 90 day period.
54:00 so what you did was you predicted on the data we already knew. What about the true future, fitting model on all known data and predicting next unkown 90 days ? BG/NBD and gamma-gamma models can do that.
I do. Some things in pandas take way too long. But everything is still possible. The big problem is I feel bad for you because my at labs are 2-3X more efficient so you get to results faster. In python the code always takes more. Like 300 lines vs 200 lines in R.
Two ways you can use this model. 1. Retrospectively run this model on your database of existing customers daily. You will have an updated list of what your customers were expected to spend vs what they actually spent. 2. For new customers, allow them some time to generate data. Then use the model to predict their predicted 90 day spend in the future. In this case, you won't know their actual 90 day spend as it hasn't happened yet. But you can append the predictions back to your customer ID's and action the customers with the lowest predicted spend. I prefer method 1 as I want to action my customers now based on the reality of the situation. I know exactly what they spent vs what the model thought they would. This allows me to categorise them into 'high risk churners' or 'exceeded expectations' etc.
for this problem, don't we need to see whether the xgboost model is performing well by having training and testing (unseen data which does not influence the training of the model) sets?
Ahhhhhhhhh! It’s because we use a special technique to model the likelihood of purchase in next 90 days. We create multiple train/test sets in the lesson. The second train set is holdout.
Sorry I reviewed the lesson again. We perform 5 fold cross validation. The cv = 5. This gives us metrics to evaluate. We did not do parameter tuning. Not enough time and that’s what I teach in the courses. But predicting on test data is perfectly fine to assess their probability of purchase and their estimated number of days to purchase.
Amazing explanation. Definitely gonna subscribe pro service. What is your opinion about lifetimes library? How reliable is the CLV calculation made by lifetimes lib using gamma-gamma and BG/NBD model?
Lifetimes library uses traditional models. I don’t get good results with them for my projects at business science. Machine learning has been extremely beneficial for email subscriber modeling and customer targeting/segmentation. I use H2O which is available in both R & Python. It’s amazing.
Hi! Thanks for these amazing videos! I just started in Marketing Analysis and these labs help me a lot to understand the calculations and possibilities! One question: why in the minute 31:23 you sum the 'price' column? Should not be price * quantity? Btw, the labs pro include all the labs?