Woah ... this was some really awesome content ... so glad to have subscribed to your TidyTuesday project last year ... helped me a lot in my Data Science job ... thanks a lot, Andrew!
This is awesome. Thank you! Have you tried building a random forest without dummy encoding? I'm curious about the model performance in that case. Apparently, the ranger package can handle the raw columns. On the other hand, xgboost needs the dummy encoding.
I have done it without dummying/one-hot encoding and it generally will not make a difference. For me, I like to do dummy encoding so I can try different models with the same recipe. Thanks for watching!