So it is basically a variant form of pixel attack? Or an attack based on adding noise, I forgot what the corresponding name was. Thank you for all your work on this channel, you are doing a great service to the community.
33:45 inducing correlated weights might be good on a type of distillation, since you could check main characteristics of the "professor neural network" and induce these correlations and weight distributions to the "student".
I didn't really get your idea at the end. If your alterations to the data are not bound to a specific class, how would you force the network to pay attention to the alterations?
If the cos diff is significant between unmarked and marked data in the same class, it should be just as easy to tell the difference between the two by comparing cosine differences between the feature vectors of the samples within the same class in the black box test. Or say taking all pairwise differences between the feature vectors of samples of the same class and maybe doing a PCA or smth, we should expect one of the eigenvectors to be a sign of marked data. For a defence to be effective tho, the effort to twart the defence has to be less than the benefit of using said dataset. Given that u have to train a decent model to detect if the data has been marked in the first place, ill say this defence is effective? Somewhat? EDIT: Ooo ur suggestion does make it way more sneaky
Can you explain how to understand figure 4 and figure 5? And since you mention alignment throughout the paper, why don't you use the angle between translation vector (\phi_0(x)-\phi_t(x)) and u to determine if the marked data were used? What is the benefit of referring beta distribution?
I'm highly skeptical about this whole data marking idea: whatever you do, it needs to be invisible to the eye, i.e. small. And if it's small, it'll surely disappear after converting the image to jpg/blurring/applying some other slight modification. And to me it seems downright impossible to go around this problem
I thought the same. Deteriorating data will help not only defend against this kind of membership inference attack, but also make the classifier more robust to noise. I wish the authors explored effects of more data augmentations to attack performance, other than just crop and resize. Regarding the solution to go around this problem, the watermark needs to be robust to noise during marking time. Hence eq7 in the paper should take that into account.
What happens if you use a bigger model on the radio active data ? Or just a different arch ? Shouldn’t that break the whole thing ? Assuming different arch will learn different feats. Ie FFN or CNN for example?
I was confused too. But I think you just need to feed the model a sample that only contains the radioactive feature for a particular class and see if it tends to classify it as such.
@@YannicKilcher but how you can compute the cosine similarity when the features size are different? The transformation M would not be dxd. And in this case, do you need to train a model with the same architecture and find out?
and in that case, you cannot guarantee the training process is the same as other trainer according to the prior assumption that the training process is unknown.
Well, what I feel uneasy about - that the feature extractors would be related simply by a linear transformation. I may be wrong, but there was a video on your channel, where It was shown, that even a different initializations of the neural network with the same architecture may lead to drastically different result, after the training process, having stucked into a completely different region in the weight space. And for the different architecture, the behaviour inside, the feature extraction seems to have little in common, with the setting, trained by those, whose want to protect their data
This seems like it is supposed to be like watermarking pictures in that it allows you to demonstrate that a network was trained using the data you marked (analogous to demonstrating that a pictures used was watermarked by you by, pointing at the watermark), but different in that without knowledge of how it was marked, one can't tell if it was marked? Or, wait, is watermarking already an established idea in the context of training data?