Shapley Additive Explanations (SHAP)

KIE

Подписаться 2,5 тыс.

Просмотров 64 тыс.

50% 1

Видео Поделиться Скачать Добавить в

Опубликовано:

14 окт 2024

Ссылка:

Скачать:

Готовим ссылку...

Добавить в:

Мой плейлист

Посмотреть позже

Комментарии : 82

@КириллКлимушин 9 месяцев назад

This is literally the best explanation of shapley values I've found on RU-vid, and probably on the entire internet, the voice, visualizations - everything on the top level

@chinameng4636 2 года назад

really brilliant work! I've seen so many videos but none of them talk about the backgroud data! Your video go into this question deep enough in such a short time! THANKS A LOT!

@mohammadsharara3170 Год назад

Very clear explanation! I've watched several videos, so far this is the best. Thank you

@cornevanzyl5880 4 месяца назад

As my PhD involves understanding how SHAP works for model explainablity, this video is by far the most accurate and indepth explanation of what it is and how it works. You demonstrate a very good grasp of the topic😊

@imwithu2532 2 месяца назад

Can I contact you? I m also a phd student . I also working on shap..can we discuss

@GleipnirHoldsFenrir 3 года назад

Best video on that topic I have seen so far. Thanks for your work.

@ssethia86 3 года назад

concise, clean, and clear. Nicely delivered!! Bravo!!

@NilayBadavne 3 года назад

Thank you so much for this video. Really well articulated. You start from the basics - which is what many are missing from their blogs/videos.

@dianegenereux1264 2 года назад

I really appreciated the clear conceptual explanation at the very start of the video. Thank you!

@zahrabounik3390 Год назад

This is a fantastic explanation for SHAP. Thank you so much for sharing your knowledge.

@mehulsingh3497 3 года назад

Thanks for sharing ! It’s the best explanation for SHAP. You are an absolute rockstar \m/

@SanderJanssenBSc 11 месяцев назад

Such an excellent video, very high value and useful! Thanks fort taking the time out of your life to produce such value for us!

@hyunkang2090 Год назад

Thank you. It was the best presentation on SHAP

@captainmarshalliii3304 3 года назад

Awesome video and explanation! Are you going to release your implementation? If so where? Thanks.

@apah Год назад

Excellent video ! I'm wondering however, isn't the difference between SHAP delta and actual delta due to the possible interactions between the "lower status" feature and the others ? If i'm understanding it correctly, your computation of "actual delta" is equivalent to a permutation importance whereas SHAP takes into account the interactions through averaging the score over subsets "excluding" our feature of interest.

@fatemekakavandi Год назад

really nice explanation, i thought understanding this concept is difficult but it's actually really easy with good explanation

@yedmitry Год назад

Great explanation. Thank you very much!

@Sam-vi8iw 2 года назад

Awesome video! Love that.

@avddva1367 Год назад

Hello, I really appreciate the video! I have one question: how are the number of coalitions calculated? I thought it would be 2^(number of features)

@kruan2661 Год назад

It depends on whether the order of feature matter. If not then 2^k. If yes then sum up all permutations

@AJMJstudios Год назад

@@kruan2661 Still not getting 64 coalitions for 4 features even if order matters.

@andyrobertshaw9120 Год назад

@@AJMJstudios you do. If all 4 are there, we have 4! = 24. If 3 are included then we have 4x3x2 = 24 If 2 are include we have 4x3 = 12 If just 1 is included, we have 4. 24+24+12+4=64

@caiyu538 Год назад

I studied it again. Shap is a brutal force search of features by considering all kinds of feature combinations. Shap is a trick to reduce such complexities. Is my understanding correct? How to reduce computation complexity?

@joshuadarville1915 2 года назад

Love the video. However, I am a bit confused about how the total numbers of coalitions is calculated. The samples at time 1:58 shows 15 coalitions for 4 features, but at time 5:55 you state we need to sample 64 coalitions for 4 features. I think the discrepancy comes for calculating coalitions using combination initially vs permutation later on. Thanks again for the video!

@robgeada6618 2 года назад

Yes, you are exactly right, it's an error in the video: 4 features should be 2^4=16 coalitions.

@astaragmohapatra9 2 года назад

@Rob Geada, how is it 2 to the power number of number of features? For 4 features we can 3,2 or 1 possible combination. For each it is 3CN. It should be around 7 (3C1+3C2+3C3). So the total is 28 for four features. Am I right?

@AJMJstudios Год назад

@@astaragmohapatra9 It's 4C0 + 4C1 + 4C2 + 4C3 + 4C4 = 16

@gustavhartz6153 Год назад

When you pass the data point back through the model at 10:35 which value do you replace the last feature with. You say "values from the background dataset, " but it can't just be a random value. Is it the average?

@Jorvanius 2 года назад

Thank you very much for the awesome explanation 👍

@JK-co3du 2 года назад

Thank you very much for this informative video. Could you explain why we use the train set as background but test set to calculate the shap values?

@robgeada6618 2 года назад

Hi JK; the background simply needs to be taken from a pool of "representative" values that the model expects; in this case a subset of the data that was used to train the model makes a lot of sense for that. Meanwhile, computing Shap values for a particular point is simply done to explain how the model behaves given this particular input; there is no requirement that this input be anything similar to what the model has seen before. Basically, the background set needs to come from 'representative' data, but we can then compute Shap values from any arbitrary point. In this case, we pick a point from the test set, as in real-world XAI usecases you are explaining novel points that do not neccesarily have corresponding ground-truth values, i.e., the same reason that we use train/test splits when evaluating models.

@arunshankar4845 10 месяцев назад

How exactly did you say 4 features requires sampling 64 coalitions?

@juanete69 Год назад

Hello. If we apply SHAP to a linear regression model... are those Phi_i equivalent to the coefficients of the regression model? Do they also take into account the variance as the p-values do? How is the SHAP value for a variable different from the partial R^2?

@caiyu538 Год назад

great to revisit again.

@ea2187 2 года назад

Thanks for sharing. I'm currently developing a Multi-Class-Classifier (via XGB-Classification) and would like to know whether SHAP can be used in multi-class-classification-problems? During my research I could only find that SHAP can be used for classification problems which output probabilities (my model outputs three classes). Can anyone help?

@robgeada6618 2 года назад

I asnwered this question in a private message, but I'll post the answer here as well: Yes, because the XGBClassifier does indeed output probabilities (or more specifically, margins), they're just hidden by default. However, you can use these margins and probabilities to compute SHAP values, which will then indicate how much each feature contributed to the margins or probabilities.

@murilopalomosebilla2999 3 года назад

Excellent work!

@cleverclover7 2 года назад

Great video! I have many questions on this subject but here's one(ish): It strikes me that the Background sample is not irrelevant and you must assume it is sufficiently random, iid. There is at least one case - the case where the background sample is the data point being tested, where this is certainly not true. So my question is, if you were to run the experiment again for every possible data point instead of a single background chunk of size 100, and took the average of these, would you get perfect accuracy?

@robgeada6618 2 года назад

Yeah, so choice of background data is a really interesting question, one that I think about quite a bit! In terms of your idea, choosing every available training data point as your background does well-represent the distribution of your data, but that gets pretty expensive: SHAP will need to run num_samples * background_size datapoints through the model. For a larger dataset like those seen in ML work, that could be hundreds of millions of model evaluations. One way to get around this is use something like kmeans clustering on your training data, with k set to something like 100. The centerpoints of your clusters are then a great representation of the training data distribution, which means when you use them for SHAP you end up with very similar results to using the entire training data as background. The advantage of this is that it's a lot cheaper, in that k~100 is usually much, much smaller than the full training dataset.

@sehaconsulting 2 года назад

Hi, In the video you said for calculating coalitions that if a model has 4 features it must calculate 64 coalitions but for 32 features it is 16 billion or so. Can you explain the math behind it. In your example you had 4 features exemplified by the four dots and it only amounted to 16 coalitions didn’t it?

@robgeada6618 2 года назад

Hi, you're exactly right; as I've said elsewhere in the comments it's a mistake in the video. 4 features indeed have 16 possible coalitions, it's always 2^(number of features).

@sehaconsulting 2 года назад

@@robgeada6618 Thank you!

@KountayDwivedi 2 года назад

@@robgeada6618 Thanks. I came to the comment section just for clarification. Btw, great video !! 😎 :-}

@xaviergonzalez5465 2 года назад

What does it mean for the original input x and the simplified x' to be approximately equal? Isn't x' a binary vector of features, whereas X presumably lives in Euclidean space?

@robgeada6618 2 года назад

Yeah, you're exactly right that x' is binary and x is Euclidean. In the video I'm making a bit of a simplication; in real usage the simplified x' will have some translation function h that converts the binary vector back to the original datapoint x, i.e, h(x') = x. The full definition of local accuracy states that g(x') = f(x) if h(x') = x.

@joshinkumar 2 года назад

Nicely explained.

@jamalnuman 7 месяцев назад

really great

@giuliasantai4853 3 года назад

This is just great!! Thanks a lot

@kjlmomjihnugbzvftcrdes Год назад

Nice one.

@Brume_ 3 года назад

Hi, im writing my report. I have 2 questions very important to ask to you 1) how many coalitions are selected when i compute my explainer? 2) are the coalitions taking all of the value in background ? 6:38 y is the mean of N output if the background size in N rows? Thank you a lot Sorry for bad english im french

@robgeada6618 3 года назад

Hi Brume! 1) The number of coalitions is typically the number of samples, usually configurable in the implementation. By default in our implementation and in the original Python one by Scott Lundberg, the default value is (2 * num_features) + 2048 coalitions unless the user specifies otherwise. 2) Correct, the coalition value is the mean value over the N background datapoints.

@ron769 2 года назад

Thanks Rob! So since that the number of coalitions is not all possible combination (NP hard), how can we assure that the Shap value are closley enough to the original shapely value?

@juanete69 Год назад

What are the advantages of SHAP vs LIME (Local Interpretable Model Agnostic Explanation) and ALE (Accumulated Local Effects)?

@amelrahmoune-y7d Год назад

can i have the ppt document of this presentation please ??

@marcelbritsch6233 5 месяцев назад

brilliant. Thank you!!!!!

@chinuverma5374 3 года назад

Thanks for wonderful session sir.With the help of shap we can find top features graph,correlated features graphs using PDP.but simple feature selection and ranking algo in machine learning can also give us top features used in model according to rank even we can plot graphs of correlated features like shap using feature selection algorithms in machine learning.I am confused but extra explainable model is doing here to explain the predictions.Pls clear my doubt currently I am doing research in this area.

@robgeada6618 3 года назад

So if I understand correctly, you're wondering about the explanatory model is at 4:00? Essentially, the explanatory model g(x') is what SHAP builds to produce its explanation of your actual model f(x). By passing a lot of different permutations of features through the actual model f(x), the algorithm creates a huge number of samples of inputs and outputs of your real model that it can then try and build a linear explanation model g(x') that produces the same outputs given the same inputs. Therefore, the linear explanation model should treat the features of this datapoint in the same way as the actual model would, meaning we can use it to explain the actual model's predictions. So in a way, SHAP explanations are actually explaining g(x'), but since the algorithm is designed such that if x'≈x, g(x')≈f(x), the explanations of g(x') are equally valid as explanations of f(x). Does the clear it up?

@yuchenyue1243 2 года назад

Thanks for sharing! Can anyone explain why are there 64 coalitions to sample for 4 features? at 5:52

@robgeada6618 2 года назад

Hi, looking at it again, that's a mistake on my part. It should be 16 coalitions for 4 features, i.e: (4 choose 4) + (4 choose 3) + (4 choose 2) + (4 choose 1) + (4 choose 0) = 1 + 4 + 6 + 4 + 1 = 16

@yuchenyue1243 2 года назад

@@robgeada6618 (4 choose 4) + (4 choose 3) + (4 choose 2) + (4 choose 1) + (4 choose 0) = 2^4, is it generally true that for n features there are 2^n coalitions to sample?

@robgeada6618 2 года назад

@@yuchenyue1243 Yep, exactly. One way to think about is by writing out each feature combination as a vector, with a 1 if a feature is included in the coalition and 0 if it isn't. Doing this for 4 features, you'd have something like 0000, 0001, 0010, 0011, ..., all the way to 1111. This means that enumerating every possible feature combination is the same as counting in binary from 0 to 1111. That means that for n features, the number of coalitions to sample is always equivalent to the number of integers that can be represented by n bits in binary: 2^n.

@pedrogallego1673 2 года назад

At 05:58 , is it possible that the number of total coalitions with 32 features is wrong? I think that it is 32*2*2³¹ = 2³⁷ (and 17.1billion =approx 2³⁴)

@robgeada6618 2 года назад

Hi Pedro; as I've said elsewhere in the comments I made a mistake when calculating the total coalition count; it should always be always 2^(number of features), so 32 features is 2^32 or ~4.2 billion.

@pedrogallego1673 2 года назад

@@robgeada6618 Thanks. However It's a really nice video!

@1黄-m5m 2 года назад

So clear! Thanks.

@caiyu538 2 года назад

I am confused with at 9:55 where is variable test_point, it is previous x_train or y_train at 8:28?

@robgeada6618 2 года назад

Should have showed that, sorry! test_point is the first datapoint of x_test: test_point = x_test[0]

@tashtanudji4756 2 года назад

Really helpful thanks!

@ВадимШатов-з2й 3 года назад

amazing

@arunmohan1211 3 года назад

Nice. best one

@DrJalal90 2 года назад

Great video indeed!

@blueprint5300 4 месяца назад

In response to the discrepancy that you call a mistake between 'SHAP delta' and 'Actual delta' in 10:57, those two values are not meant to be the same. Shapley values are the average contribution of all subsets of features. 'Actual delta' would be only one of the terms in this average. The Shapley value of feature X DOES NOT represent the difference in output that you would get when removing feature X from the model.

@minhaoling3056 2 года назад

Does kernel shap ignores feature dependence?

@diaaalmohamad2166 2 года назад

I'm also wondering there. The paper of Lundberg assumes independent features to be able to estimate the contributions. Still, the reason for having all possible coalitions is to count for mutual effect! On the other hand, a paper appeared las year (Explaining individual predictions when features are dependent) addresses the SHAP under dependence of features (shapr is their R package). The estimate the joint conditional distribution of the features provided the current coalition using copulas (and other methods). Still, their implementation has quite some computation limitations

@minhaoling3056 2 года назад

@@diaaalmohamad2166 it seems like most explainable AI methods are quite limited for image data. Do you know any method that are implemented in R for image data ?

@diaaalmohamad2166 2 года назад

sorry, I do not know of R packages specific for image analysis. I tried package "iml", there you can find different methods to explain features contribution. I did not check their limitations. Worst case, you may use the python package "shap" inside Rmarkdown code chunk.

@rusmannlalana8702 2 года назад

"TENTARA ITU HARUS HITAM" This video :

@pinkguy3852 2 года назад

nyasar

@exmanitor 2 года назад

With regards to your last point, that the "SHAP Delta" does not match the "Actual Delta": I think that you are misunderstanding what these values represent. The SHAP value of a specific feature do not represent the difference in prediction if we were to exclude/remove that feature from the model. Instead, the SHAP value of a specific feature represents the average contribution of the feature across all coalitions. This is why your "SHAP Delta" and "Actual Delta" do not match, the "Actual Delta" is just the contribution of the feature in a single coalition. Other than that, great video!

@robgeada6618 2 года назад

Thanks! Two quick points: first, the "actual" delta I showed there is the average model output when that specific feature is replaced with each value from the background while all other features are held fixed. It's what that feature's SHAP value would be if the background dataset only had variance in that one specific feature column and was otherwise identical to the explained datapoint. So yeah, it absolutely was not an accurate measurement of what a SHAP value is really doing mathematically. But second, that was deliberate: SHAP is advertised as producing explanations that are linearly independent measurements of each feature's contribution, but as our result showed, the SHAP value wasn't actually reflective of how this particular model behaved when you removed the feature. And of course, that's because of the exact reasons you pointed out, that a SHAP value is a measurement of the difference between that feature's presence and absence in every possible coalition of the background, not an indication of the effects of pure removal/exclusion. So in essence, that's the exact point I was trying to make: SHAP values do indeed encode all kinds of subtle information about feature dependence and all of the comparisons against the specific background dataset chosen, but were relatively inaccurate in the measuring the effect of replacing a single feature with background values. This difference is what I was trying to show but I definitely should have been clearer about: for models with a lot of feature interaction, SHAP will sacrifice single-feature effect accuracy for accurately representing all feature interations against the background, and whether that is a desirable attribute will depend on specific use-case and user preference.