Radioactive data: tracing through training

Sablayrolles Alexandre
Sablayrolles Alexandre

ICML, pp. 8326-8335, 2020.

Cited by: 1|Bibtex|Views126
EI
Other Links: arxiv.org|academic.microsoft.com|dblp.uni-trier.de
Weibo:
We propose a new technique, \emph{radioactive data}, that makes imperceptible changes to this dataset such that any model trained on it will bear an identifiable mark

Abstract:

We want to detect whether a particular image dataset has been used to train a model. We propose a new technique, \emph{radioactive data}, that makes imperceptible changes to this dataset such that any model trained on it will bear an identifiable mark. The mark is robust to strong variations such as different architectures or optimizati...More

Code:

Data:

0
Introduction
  • The availability of large-scale public datasets has accelerated the development of machine learning.
  • ☢ CNN solve specific tasks, but as a side-effect reproduce the bias in the datasets (Torralba et al, 2011).
  • Such a bias is a weak signal that a particular dataset has been used to solve a task.
  • The authors slightly change the dataset, effectively substituting the data for similar-looking marked data
Highlights
  • The availability of large-scale public datasets has accelerated the development of machine learning
  • Our aim in this paper is to provide a proof of concept that marking data is possible with statistical guarantees, and the analysis of defense mechanisms lies outside the scope of this paper
  • The results confirm that our watermark can be detected when only q = 1% of the data is used at train time
  • This setup is more complicated for our marks because since the network is retrained from scratch, the directions that will be learned in the new feature space have no a priori reason to be aligned with the directions of the network we used
  • The method proposed in this paper, radioactive data, is a way to verify if some data was used to train a model, with statistical guarantees
  • We have shown in this paper that such radioactive contamination is effective on large-scale computer vision tasks such as classification on Imagenet with modern architecture (Resnet-18 and Resnet-50), even when only a very small fraction (1%) of the training data is radioactive
Methods
  • The authors perform training using the standard set of data augmentations from Pytorch (Paszke et al, 2017).
  • The authors train with SGD with a momentum of 0.9 and a weight decay of 10−4 for 90 epochs, using a batch size of 2048 across 8 GPUs. The authors use Pytorch (Paszke et al, 2017) and adopt its standard data augmentation settings.
  • On a vanilla Imagenet, the authors obtain a top 1 accuracy of 69.6% and a top-5 accuracy of
Results
  • Table 2 shows the results of retraining a Resnet-18 from scratch on radioactive data.
  • The results confirm that the watermark can be detected when only q = 1% of the data is used at train time
  • This setup is more complicated for the marks because since the network is retrained from scratch, the directions that will be learned in the new feature space have no a priori reason to be aligned with the directions of the network the authors used.
  • The authors hypothesize that the multiple crops make the network believe it sees more variety, but in reality all the feature representations of these crops are aligned with the carrier which makes the network learn the carrier direction
Conclusion
  • The experiments validate that the radioactive marks do imprint on the trained models.
  • The authors observe two beneficial effects: data augmentation improves the strength of the mark, and transferring the mask to a larger and more realistic architectures makes its detection more reliable
  • These two observations suggest that the radioactive method is appropriate for real use cases.The method proposed in this paper, radioactive data, is a way to verify if some data was used to train a model, with statistical guarantees.
  • It is not the core topic of the paper, the method incidentally offers a way to watermark images in the classical sense (Cayre et al, 2005)
Summary
  • Introduction:

    The availability of large-scale public datasets has accelerated the development of machine learning.
  • ☢ CNN solve specific tasks, but as a side-effect reproduce the bias in the datasets (Torralba et al, 2011).
  • Such a bias is a weak signal that a particular dataset has been used to solve a task.
  • The authors slightly change the dataset, effectively substituting the data for similar-looking marked data
  • Methods:

    The authors perform training using the standard set of data augmentations from Pytorch (Paszke et al, 2017).
  • The authors train with SGD with a momentum of 0.9 and a weight decay of 10−4 for 90 epochs, using a batch size of 2048 across 8 GPUs. The authors use Pytorch (Paszke et al, 2017) and adopt its standard data augmentation settings.
  • On a vanilla Imagenet, the authors obtain a top 1 accuracy of 69.6% and a top-5 accuracy of
  • Results:

    Table 2 shows the results of retraining a Resnet-18 from scratch on radioactive data.
  • The results confirm that the watermark can be detected when only q = 1% of the data is used at train time
  • This setup is more complicated for the marks because since the network is retrained from scratch, the directions that will be learned in the new feature space have no a priori reason to be aligned with the directions of the network the authors used.
  • The authors hypothesize that the multiple crops make the network believe it sees more variety, but in reality all the feature representations of these crops are aligned with the carrier which makes the network learn the carrier direction
  • Conclusion:

    The experiments validate that the radioactive marks do imprint on the trained models.
  • The authors observe two beneficial effects: data augmentation improves the strength of the mark, and transferring the mask to a larger and more realistic architectures makes its detection more reliable
  • These two observations suggest that the radioactive method is appropriate for real use cases.The method proposed in this paper, radioactive data, is a way to verify if some data was used to train a model, with statistical guarantees.
  • It is not the core topic of the paper, the method incidentally offers a way to watermark images in the classical sense (Cayre et al, 2005)
Tables
  • Table1: p-value (statistical significance) for the detection of radioactive data usage when only a fraction of the training data is radioactive. Results for a logistic regression classifier trained on Imagenet with Resnet-18 features , with only a percentage of the data bearing the radioactive mark. Our method can identify with a very high confidence (log10(p) < −38) that the classifier was trained on radioactive data, even when only 1% of the training data is radioactive. The radioactive data has an impact on the accuracy of the classifier: around −1% (top-1)
  • Table2: p-value (statistical significance) for radioactivity detection. Results for a Resnet-18 trained from scratch on Imagenet, with only a percentage of the data bearing the radioactive mark. We are able to identify models trained from scratch on only q = 1% of radioactive data. The presence of radioactive data has negligible impact on the accuracy of a learned model as long as the fraction of radioactive data is under 10%
  • Table3: p-value (statistical significance) for radioactivity detection. Results for different architectures trained from scratch on Imagenet. Even though radioactive data was crafted using a ResNet-18, models of other architectures also become radioactive when trained on this data
  • Table4: p-value of radioactivity detection. A Resnet-18 is trained on Places205 from scratch, and a percentage of the dataset is radioactive. When 10% of the data or more is radioactive, we are able to detect radioactivity with a strong confidence (p < 10−3)
  • Table5: p-value for the detection of radioactive data usage. A Resnet-18 is trained on Imagenet from scratch, and a percentage of the training data is radioactive. This marked network is distilled into another network, on which we test radioactivity. When 2% of the data or more is radioactive, we are able to detect the use of this data with a strong confidence (p < 10−3)
Download tables as Excel
Related work
  • Watermarking is a way of tracking media content by adding a mark to it. In its simplest form, a watermark is an addition in the pixel space of an image, that is not visually perceptible. Zero-bit watermarking techniques (Cayre et al, 2005) modify the pixels of an image so that its Fourier transform lies in the cone generated by an arbitrary random direction, the “carrier”. When the same image or a slightly perturbed version of it are encountered, the presence of the watermark is assessed by verifying whether the Fourier representation lies in the cone generated by the carrier. Zero-bit watermarking detects whether an image is marked or not, but in general watermarking also considers the case where the marks carry a number of bits of information (Cox et al, 2002).

    Traditional watermarking is notoriously not robust to geometrical attacks (Vukoticet al., 2018). In contrast, the latent space associated with deep networks is almost invariant to such transformations, due to the train-time data augmentations. This observation has motivated several authors to employ convnets to watermark images (Vukoticet al., 2018; Zhu et al, 2018) by inserting marks in this latent space. HiDDeN (Zhu et al, 2018) is an example of these approaches, applied either for steganographic or watermarking purposes.
Reference
  • Abadi, M., Chu, A., Goodfellow, I., McMahan, H. B., Mironov, I., Talwar, K., and Zhang, L. Deep learning with differential privacy. In SIGSAC. ACM, 2016.
    Google ScholarLocate open access versionFindings
  • Adi, Y., Baum, C., Cisse, M., Pinkas, B., and Keshet, J. Turning your weakness into a strength: Watermarking deep neural networks by backdooring. In USENIX Security Symposium, 2018.
    Google ScholarLocate open access versionFindings
  • Biggio, B., Nelson, B., and Laskov, P. Poisoning attacks against support vector machines. In ICML, 2012.
    Google ScholarLocate open access versionFindings
  • Carlini, N. and Wagner, D. Towards evaluating the robustness of neural networks. In IEEE Symp. Security and Privacy, 2017.
    Google ScholarLocate open access versionFindings
  • Carlini, N., Liu, C., Kos, J., Erlingsson, U., and Song, D. The secret sharer: Measuring unintended neural network memorization & extracting secrets. arXiv preprint arXiv:1802.08232, 2018.
    Findings
  • Caron, M., Bojanowski, P., Mairal, J., and Joulin, A. Unsupervised pre-training of image features on non-curated data. In ICCV, 2019.
    Google ScholarLocate open access versionFindings
  • Cayre, F., Fontaine, C., and Furon, T. Watermarking security: theory and practice. IEEE Transactions on Signal Processing, 2005.
    Google ScholarLocate open access versionFindings
  • Chen, X., Liu, C., Li, B., Lu, K., and Song, D. Targeted backdoor attacks on deep learning systems using data poisoning. CoRR, abs/1712.05526, 2017.
    Findings
  • Deng, J., Dong, W., Socher, R., Li, L.-J., Li, K., and Fei-Fei, L. Imagenet: A large-scale hierarchical image database. In CVPR, 2009.
    Google ScholarLocate open access versionFindings
  • Dwork, C., McSherry, F., Nissim, K., and Smith, A. Calibrating noise to sensitivity in private data analysis. In TCC, 2006.
    Google ScholarLocate open access versionFindings
  • Fisher, R. Statistical methods for research workers. 1925.
    Google ScholarFindings
  • Goodfellow, I. J., Shlens, J., and Szegedy, C. Explaining and harnessing adversarial examples. In ICLR, 2015.
    Google ScholarLocate open access versionFindings
  • Goyal, P., Dollar, P., Girshick, R., Noordhuis, P., Wesolowski, L., Kyrola, A., Tulloch, A., Jia, Y., and He, K. Accurate, large minibatch sgd: Training imagenet in 1 hour. arXiv preprint arXiv:1706.02677, 2017.
    Findings
  • Gu, T., Liu, K., Dolan-Gavitt, B., and Garg, S. Badnets: Evaluating backdooring attacks on deep neural networks. In Machine Learning and Computer Security Workshop, 2017.
    Google ScholarLocate open access versionFindings
  • He, K., Zhang, X., Ren, S., and Sun, J. Deep residual learning for image recognition. In CVPR, 2016.
    Google ScholarLocate open access versionFindings
  • He, K., Gkioxari, G., Dollar, P., and Girshick, R. Mask r-cnn. In ICCV, 2017.
    Google ScholarLocate open access versionFindings
  • Hinton, G., Vinyals, O., and Dean, J. Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531, 2015.
    Findings
  • Iscen, A., Furon, T., Gripon, V., Rabbat, M., and Jegou, H. Memory vectors for similarity search in highdimensional spaces. IEEE Transactions on Big Data, 2017.
    Google ScholarLocate open access versionFindings
  • Jegou, H., Douze, M., and Schmid, C. Hamming embedding and weak geometric consistency for large scale image search. In ECCV, 2008.
    Google ScholarLocate open access versionFindings
  • Joulin, A., van der Maaten, L., Jabri, A., and Vasilache, N. Learning visual features from large weakly supervised data. In ECCV, 2016.
    Google ScholarLocate open access versionFindings
  • Kerckhoffs, A. La cryptographie militaire [military cryptography]. Journal des sciences militaires [Military Science Journal], 1883.
    Google ScholarLocate open access versionFindings
  • Krizhevsky, A., Sutskever, I., and Hinton, G. E. Imagenet classification with deep convolutional neural networks. In NeurIPS, pp. 1097–1105, 2012.
    Google ScholarLocate open access versionFindings
  • Lin, T.-Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollar, P., and Zitnick, C. L. Microsoft coco: Common objects in context. In ECCV, 2014.
    Google ScholarLocate open access versionFindings
  • Mahajan, D., Girshick, R., Ramanathan, V., He, K., Paluri, M., Li, Y., Bharambe, A., and van der Maaten, L. Exploring the limits of weakly supervised pretraining. In ECCV, 2018.
    Google ScholarLocate open access versionFindings
  • Papernot, N., Song, S., Mironov, I., Raghunathan, A., Talwar, K., and Erlingsson, U. Scalable private learning with pate. In ICLR, 2018.
    Google ScholarLocate open access versionFindings
  • Vukotic, V., Chappelier, V., and Furon, T. Are deep neural networks good for blind image watermarking? In Workshop on Information Forensics and Security (WIFS). IEEE, 2018.
    Google ScholarLocate open access versionFindings
  • Yeom, S., Giacomelli, I., Fredrikson, M., and Jha, S. Privacy risk in machine learning: Analyzing the connection to overfitting. In CSF, 2018.
    Google ScholarLocate open access versionFindings
  • Zhu, J., Kaplan, R., Johnson, J., and Fei-Fei, L. Hidden: Hiding data with deep networks. In ECCV, 2018.
    Google ScholarLocate open access versionFindings
  • Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., and Lerer, A. Automatic differentiation in pytorch. 2017.
    Google ScholarFindings
  • Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., Huang, Z., Karpathy, A., Khosla, A., Bernstein, M., et al. Imagenet large scale visual recognition challenge. IJCV, 2015.
    Google ScholarLocate open access versionFindings
  • Sablayrolles, A., Douze, M., Ollivier, Y., Schmid, C., and Jegou, H. White-box vs black-box: Bayes optimal strategies for membership inference. In ICML, 2019.
    Google ScholarLocate open access versionFindings
  • Shafahi, A., Huang, W. R., Najibi, M., Suciu, O., Studer, C., Dumitras, T., and Goldstein, T. Poison frogs! targeted clean-label poisoning attacks on neural networks. In Bengio, S., Wallach, H., Larochelle, H., Grauman, K., Cesa-Bianchi, N., and Garnett, R. (eds.), NeurIPS, 2018.
    Google ScholarLocate open access versionFindings
  • Shokri, R., Stronati, M., and Shmatikov, V. Membership inference attacks against machine learning models. IEEE Symp. Security and Privacy, 2017.
    Google ScholarLocate open access versionFindings
  • Steinhardt, J., Koh, P. W. W., and Liang, P. S. Certified defenses for data poisoning attacks. In NeurIPS. 2017.
    Google ScholarFindings
  • Szegedy, C., Zaremba, W., Sutskever, I., Bruna, J., Erhan, D., Goodfellow, I. J., and Fergus, R. Intriguing properties of neural networks. In ICLR, 2014.
    Google ScholarLocate open access versionFindings
  • Thomee, B., Shamma, D. A., Friedland, G., Elizalde, B., Ni, K., Poland, D., Borth, D., and Li, L.-J. Yfcc100m: The new data in multimedia research. arXiv preprint arXiv:1503.01817, 2015.
    Findings
  • Tishby, N., Pereira, F. C., and Bialek, W. The information bottleneck method. arXiv preprint physics/0004057, 2000.
    Google ScholarFindings
  • Torralba, A., Efros, A. A., et al. Unbiased look at dataset bias. In CVPR, volume 1, pp. 7, 2011.
    Google ScholarLocate open access versionFindings
  • Tran, B., Li, J., and Madry, A. Spectral signatures in backdoor attacks. In NeurIPS. 2018.
    Google ScholarFindings
Full Text
Your rating :
0

 

Tags
Comments