Shapley Values for Machine Learning

Подписаться 4,6 тыс.

Просмотров 16 тыс.

50% 1

Shapley values come from game theory. So, what do they have to do with machine learning? We understand how the Shapley value formula is extended to explain how each model feature has contributed to a prediction. We will also see that the Shapley value axioms lead to desirable properties for a feature attribution method. To end, we will discuss how Shapley values can be approximated using Monte Carlo sampling, KernelSHAP and TreeSHAP.
*NOTE*: You will now get the XAI course for free if you sign up (not the SHAP course)
SHAP course: adataodyssey.c...
XAI course: adataodyssey.c...
Newsletter signup: mailchi.mp/409...
Read the companion article (no-paywall link): towardsdatasci...
Medium: / conorosullyds
Twitter: / conorosullyds
Mastodon: sigmoid.social...
Website: adataodyssey.com/

Опубликовано:

14 окт 2024

Ссылка:

Скачать:

Готовим ссылку...

Добавить в:

Мой плейлист

Посмотреть позже

Комментарии : 24

@adataodyssey 8 месяцев назад

*NOTE*: You will now get the XAI course for free if you sign up (not the SHAP course) SHAP course: adataodyssey.com/courses/shap-with-python/ XAI course: adataodyssey.com/courses/xai-with-python/ Newsletter signup: mailchi.mp/40909011987b/signup

@lleger Месяц назад

starting an internship in september where the goal is to explain with shap alot of models, this channel is a gold mine

@adataodyssey Месяц назад

Hi Louis, I am glad I could help. Good luck for your internship!

@OfficialSnowsix Год назад

These are some of the best data science tutorials Ive seen on RU-vid. Don't give up, keep making it. I know you'll make it big =)

@adataodyssey Год назад

Thank you so much! I really appreciate the support :)

@cauchyschwarz3295 Месяц назад

What I don’t understand is that, in the game video, the value of a coalition is calculated by re-running the game with each coalition. Here it would seem to me that finding the value of a feature coalition would mean re-training the model for each coalition. That doesn’t just seem expensive, it seems downright prohibitive for complex or large models. You started with a fixed regression model where the weights are already determined, but for a e.g. a neural network model leaving out features could change the weights significantly right?

@adataodyssey Месяц назад

@@cauchyschwarz3295 This is a bit confusing! But no, you will only have one model (the one you want to explain). You marginalise the prediction function (I.e. model) over the features that are not in set S. You do not have to retrain a new model by excluding the features that are not in S.

@PoisinInAPrettyPill 3 дня назад

I think there's a subtlety here in the application. In the theoretical model the goal is to calculate a fair payout for an individual feature based on how much it contributes to building a good model. Good models are measured by how well they predict things. So, we would want to think about how much the feature contributes to a reduction in model error, and you would want to train a model with and without a given feature to figure this out. (But instead we use other methods for determining which features are good that are easier to compute.) In the use case in this video (and what is the norm in ML), they are using Shapley to explain how a feature contributed to the model prediction, irregardless of how good the model actually is. This is helpful because Shapley still has desireable properties like additivity in this use case. If you trained a new model without this feature, it wouldn't answer the question of how much the feature contributed to the prediction value in the old model.

@felixgrisoni3807 2 месяца назад

Hi, your video is very well summarized but there is an error in the formula 1 must be excluded in the (Val S U {1})

@adataodyssey 2 месяца назад

I don't see the mistake. At what time period do I introduce (Val S U{1})? {1} referes to a coalition of feature 1 i.e. x1 and not the feature's values

@chrisleenatra Год назад

Nice video. But i've a question, what you showed in the video are if we are trying to "exclude" a categorical column (degree) What about continuous column (number column)? (the age) What value would we use?

@adataodyssey Год назад

Thanks! If the continuous variable is in the coalition, then we use the actual value for that instance (i.e. the person's actual age). If the continuous variable is not in the coalition, then we integrate over the values of the variable w.r.t. to the probability of the values. However, in practice, we will not know the probability distribution of a variable. So we will have to randomly sample different values for the variable from our dataset. We do this a bunch of times so we end up approximating the distribution. I hope that makes sense? There is a lot of statistical theory that underlies this explanation!

@chrisleenatra Год назад

@@adataodyssey Ahh I see, got it. But out of curiosity, Can you give me the reference of those statistical theory?

@adataodyssey Год назад

@@chrisleenatra Unfortunatley, I don't have any specific references. I'm using the knowledge from back in my undergrad. If you want to understand take a look at "stochastic calculus"

@AvijeetTulsiani Год назад

Can you share link of Previous video which explains Shapley Formula?

@adataodyssey Год назад

Sure, Avijeet! You can find all the videos in this playlist: SHAP ru-vid.com/group/PLqDyyww9y-1SJgMw92x90qPYpHgahDLIK

@NeverHadMakingsOfAVarsityAthle 9 месяцев назад

at 5:27 you mention the formula for calculating valx(S). don't we also need to subtract EX(f(X)) from that?

@adataodyssey 9 месяцев назад

You can but they will cancel out when you subtract val(S) from val(S U {i})

@NeverHadMakingsOfAVarsityAthle 9 месяцев назад

Aaaaah of course, that makes sense! Thanks, you helped me a lot - not only with this comment but the entire video series:)

@adataodyssey 9 месяцев назад

@@NeverHadMakingsOfAVarsityAthle No problem Matthias! I'm glad I could help :)

@ericafontana4020 Год назад

nice explanation! :)

@adataodyssey Год назад

Thank you Erica!

@PENUification 3 месяца назад

First you say that it's efficient. One minute later you say that is very computational expsensive?

@adataodyssey 3 месяца назад

Yes, that's correct. The efficiency property of Shapley values doesn't have anything to do with computational efficiency. It just means that if you add the Shapley values of an instance and the mean prediction across all instances, you will get the prediction for that instance.