As a junior data analyst i felt hurt when you "assumed you already did data collection" 😂 it's basically the most daunting part of the job. Deploying models is fun! Building dockers that dont work is not!
Suggestions: (1) work on a dataset that one can monitor with real data (2) deploy the model API to AWS and/or posit connect (3) showcase drifts when they happen, and show the ways to handle them. I would learn a lot from these items. Thanks for the video!
@@james-h-wade Seconding the posit connect point, would be really get to get a view into how it's done there. I've had issues deploying as there are hardly any good walkthroughs!
auc is an unreliable metric if classes are imbalanced; prediction probabilities need to be adjusted to "undo" the stratified sampling. you should keep a hold out set (randomly sampled) to verify the performance
Thanks for sharing that. I'm thinking that should be a topic for a future video. There are many to choose from, and it's hard to understand the differences. My advice is to use the one that works. Posit Connect is the easiest to use in my experience, but it's a pro product.
Great video, now I ca deploy my model as API Can you make a video like this for plumber API deployment to vercel app project? It would be helfpul since if I using huggingface the space must a public and poeple can access to my R code files.
Hello James, I have a question. I see that you did EDA first, then split the data into train and test sets. Shouldn't I do EDA after the split to avoid data leakage?