Radioactive data: tracing through training
ICML, pp. 8326-8335, 2020.
EI
Weibo:
Abstract:
We want to detect whether a particular image dataset has been used to train a model. We propose a new technique, \emph{radioactive data}, that makes imperceptible changes to this dataset such that any model trained on it will bear an identifiable mark. The mark is robust to strong variations such as different architectures or optimizati...More
Code:
Data:
Introduction
- The availability of large-scale public datasets has accelerated the development of machine learning.
- ☢ CNN solve specific tasks, but as a side-effect reproduce the bias in the datasets (Torralba et al, 2011).
- Such a bias is a weak signal that a particular dataset has been used to solve a task.
- The authors slightly change the dataset, effectively substituting the data for similar-looking marked data
Highlights
- The availability of large-scale public datasets has accelerated the development of machine learning
- Our aim in this paper is to provide a proof of concept that marking data is possible with statistical guarantees, and the analysis of defense mechanisms lies outside the scope of this paper
- The results confirm that our watermark can be detected when only q = 1% of the data is used at train time
- This setup is more complicated for our marks because since the network is retrained from scratch, the directions that will be learned in the new feature space have no a priori reason to be aligned with the directions of the network we used
- The method proposed in this paper, radioactive data, is a way to verify if some data was used to train a model, with statistical guarantees
- We have shown in this paper that such radioactive contamination is effective on large-scale computer vision tasks such as classification on Imagenet with modern architecture (Resnet-18 and Resnet-50), even when only a very small fraction (1%) of the training data is radioactive
Methods
- The authors perform training using the standard set of data augmentations from Pytorch (Paszke et al, 2017).
- The authors train with SGD with a momentum of 0.9 and a weight decay of 10−4 for 90 epochs, using a batch size of 2048 across 8 GPUs. The authors use Pytorch (Paszke et al, 2017) and adopt its standard data augmentation settings.
- On a vanilla Imagenet, the authors obtain a top 1 accuracy of 69.6% and a top-5 accuracy of
Results
- Table 2 shows the results of retraining a Resnet-18 from scratch on radioactive data.
- The results confirm that the watermark can be detected when only q = 1% of the data is used at train time
- This setup is more complicated for the marks because since the network is retrained from scratch, the directions that will be learned in the new feature space have no a priori reason to be aligned with the directions of the network the authors used.
- The authors hypothesize that the multiple crops make the network believe it sees more variety, but in reality all the feature representations of these crops are aligned with the carrier which makes the network learn the carrier direction
Conclusion
- The experiments validate that the radioactive marks do imprint on the trained models.
- The authors observe two beneficial effects: data augmentation improves the strength of the mark, and transferring the mask to a larger and more realistic architectures makes its detection more reliable
- These two observations suggest that the radioactive method is appropriate for real use cases.The method proposed in this paper, radioactive data, is a way to verify if some data was used to train a model, with statistical guarantees.
- It is not the core topic of the paper, the method incidentally offers a way to watermark images in the classical sense (Cayre et al, 2005)
Summary
Introduction:
The availability of large-scale public datasets has accelerated the development of machine learning.- ☢ CNN solve specific tasks, but as a side-effect reproduce the bias in the datasets (Torralba et al, 2011).
- Such a bias is a weak signal that a particular dataset has been used to solve a task.
- The authors slightly change the dataset, effectively substituting the data for similar-looking marked data
Methods:
The authors perform training using the standard set of data augmentations from Pytorch (Paszke et al, 2017).- The authors train with SGD with a momentum of 0.9 and a weight decay of 10−4 for 90 epochs, using a batch size of 2048 across 8 GPUs. The authors use Pytorch (Paszke et al, 2017) and adopt its standard data augmentation settings.
- On a vanilla Imagenet, the authors obtain a top 1 accuracy of 69.6% and a top-5 accuracy of
Results:
Table 2 shows the results of retraining a Resnet-18 from scratch on radioactive data.- The results confirm that the watermark can be detected when only q = 1% of the data is used at train time
- This setup is more complicated for the marks because since the network is retrained from scratch, the directions that will be learned in the new feature space have no a priori reason to be aligned with the directions of the network the authors used.
- The authors hypothesize that the multiple crops make the network believe it sees more variety, but in reality all the feature representations of these crops are aligned with the carrier which makes the network learn the carrier direction
Conclusion:
The experiments validate that the radioactive marks do imprint on the trained models.- The authors observe two beneficial effects: data augmentation improves the strength of the mark, and transferring the mask to a larger and more realistic architectures makes its detection more reliable
- These two observations suggest that the radioactive method is appropriate for real use cases.The method proposed in this paper, radioactive data, is a way to verify if some data was used to train a model, with statistical guarantees.
- It is not the core topic of the paper, the method incidentally offers a way to watermark images in the classical sense (Cayre et al, 2005)
Tables
- Table1: p-value (statistical significance) for the detection of radioactive data usage when only a fraction of the training data is radioactive. Results for a logistic regression classifier trained on Imagenet with Resnet-18 features , with only a percentage of the data bearing the radioactive mark. Our method can identify with a very high confidence (log10(p) < −38) that the classifier was trained on radioactive data, even when only 1% of the training data is radioactive. The radioactive data has an impact on the accuracy of the classifier: around −1% (top-1)
- Table2: p-value (statistical significance) for radioactivity detection. Results for a Resnet-18 trained from scratch on Imagenet, with only a percentage of the data bearing the radioactive mark. We are able to identify models trained from scratch on only q = 1% of radioactive data. The presence of radioactive data has negligible impact on the accuracy of a learned model as long as the fraction of radioactive data is under 10%
- Table3: p-value (statistical significance) for radioactivity detection. Results for different architectures trained from scratch on Imagenet. Even though radioactive data was crafted using a ResNet-18, models of other architectures also become radioactive when trained on this data
- Table4: p-value of radioactivity detection. A Resnet-18 is trained on Places205 from scratch, and a percentage of the dataset is radioactive. When 10% of the data or more is radioactive, we are able to detect radioactivity with a strong confidence (p < 10−3)
- Table5: p-value for the detection of radioactive data usage. A Resnet-18 is trained on Imagenet from scratch, and a percentage of the training data is radioactive. This marked network is distilled into another network, on which we test radioactivity. When 2% of the data or more is radioactive, we are able to detect the use of this data with a strong confidence (p < 10−3)
Related work
- Watermarking is a way of tracking media content by adding a mark to it. In its simplest form, a watermark is an addition in the pixel space of an image, that is not visually perceptible. Zero-bit watermarking techniques (Cayre et al, 2005) modify the pixels of an image so that its Fourier transform lies in the cone generated by an arbitrary random direction, the “carrier”. When the same image or a slightly perturbed version of it are encountered, the presence of the watermark is assessed by verifying whether the Fourier representation lies in the cone generated by the carrier. Zero-bit watermarking detects whether an image is marked or not, but in general watermarking also considers the case where the marks carry a number of bits of information (Cox et al, 2002).
Traditional watermarking is notoriously not robust to geometrical attacks (Vukoticet al., 2018). In contrast, the latent space associated with deep networks is almost invariant to such transformations, due to the train-time data augmentations. This observation has motivated several authors to employ convnets to watermark images (Vukoticet al., 2018; Zhu et al, 2018) by inserting marks in this latent space. HiDDeN (Zhu et al, 2018) is an example of these approaches, applied either for steganographic or watermarking purposes.
Reference
- Abadi, M., Chu, A., Goodfellow, I., McMahan, H. B., Mironov, I., Talwar, K., and Zhang, L. Deep learning with differential privacy. In SIGSAC. ACM, 2016.
- Adi, Y., Baum, C., Cisse, M., Pinkas, B., and Keshet, J. Turning your weakness into a strength: Watermarking deep neural networks by backdooring. In USENIX Security Symposium, 2018.
- Biggio, B., Nelson, B., and Laskov, P. Poisoning attacks against support vector machines. In ICML, 2012.
- Carlini, N. and Wagner, D. Towards evaluating the robustness of neural networks. In IEEE Symp. Security and Privacy, 2017.
- Carlini, N., Liu, C., Kos, J., Erlingsson, U., and Song, D. The secret sharer: Measuring unintended neural network memorization & extracting secrets. arXiv preprint arXiv:1802.08232, 2018.
- Caron, M., Bojanowski, P., Mairal, J., and Joulin, A. Unsupervised pre-training of image features on non-curated data. In ICCV, 2019.
- Cayre, F., Fontaine, C., and Furon, T. Watermarking security: theory and practice. IEEE Transactions on Signal Processing, 2005.
- Chen, X., Liu, C., Li, B., Lu, K., and Song, D. Targeted backdoor attacks on deep learning systems using data poisoning. CoRR, abs/1712.05526, 2017.
- Deng, J., Dong, W., Socher, R., Li, L.-J., Li, K., and Fei-Fei, L. Imagenet: A large-scale hierarchical image database. In CVPR, 2009.
- Dwork, C., McSherry, F., Nissim, K., and Smith, A. Calibrating noise to sensitivity in private data analysis. In TCC, 2006.
- Fisher, R. Statistical methods for research workers. 1925.
- Goodfellow, I. J., Shlens, J., and Szegedy, C. Explaining and harnessing adversarial examples. In ICLR, 2015.
- Goyal, P., Dollar, P., Girshick, R., Noordhuis, P., Wesolowski, L., Kyrola, A., Tulloch, A., Jia, Y., and He, K. Accurate, large minibatch sgd: Training imagenet in 1 hour. arXiv preprint arXiv:1706.02677, 2017.
- Gu, T., Liu, K., Dolan-Gavitt, B., and Garg, S. Badnets: Evaluating backdooring attacks on deep neural networks. In Machine Learning and Computer Security Workshop, 2017.
- He, K., Zhang, X., Ren, S., and Sun, J. Deep residual learning for image recognition. In CVPR, 2016.
- He, K., Gkioxari, G., Dollar, P., and Girshick, R. Mask r-cnn. In ICCV, 2017.
- Hinton, G., Vinyals, O., and Dean, J. Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531, 2015.
- Iscen, A., Furon, T., Gripon, V., Rabbat, M., and Jegou, H. Memory vectors for similarity search in highdimensional spaces. IEEE Transactions on Big Data, 2017.
- Jegou, H., Douze, M., and Schmid, C. Hamming embedding and weak geometric consistency for large scale image search. In ECCV, 2008.
- Joulin, A., van der Maaten, L., Jabri, A., and Vasilache, N. Learning visual features from large weakly supervised data. In ECCV, 2016.
- Kerckhoffs, A. La cryptographie militaire [military cryptography]. Journal des sciences militaires [Military Science Journal], 1883.
- Krizhevsky, A., Sutskever, I., and Hinton, G. E. Imagenet classification with deep convolutional neural networks. In NeurIPS, pp. 1097–1105, 2012.
- Lin, T.-Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollar, P., and Zitnick, C. L. Microsoft coco: Common objects in context. In ECCV, 2014.
- Mahajan, D., Girshick, R., Ramanathan, V., He, K., Paluri, M., Li, Y., Bharambe, A., and van der Maaten, L. Exploring the limits of weakly supervised pretraining. In ECCV, 2018.
- Papernot, N., Song, S., Mironov, I., Raghunathan, A., Talwar, K., and Erlingsson, U. Scalable private learning with pate. In ICLR, 2018.
- Vukotic, V., Chappelier, V., and Furon, T. Are deep neural networks good for blind image watermarking? In Workshop on Information Forensics and Security (WIFS). IEEE, 2018.
- Yeom, S., Giacomelli, I., Fredrikson, M., and Jha, S. Privacy risk in machine learning: Analyzing the connection to overfitting. In CSF, 2018.
- Zhu, J., Kaplan, R., Johnson, J., and Fei-Fei, L. Hidden: Hiding data with deep networks. In ECCV, 2018.
- Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., and Lerer, A. Automatic differentiation in pytorch. 2017.
- Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., Huang, Z., Karpathy, A., Khosla, A., Bernstein, M., et al. Imagenet large scale visual recognition challenge. IJCV, 2015.
- Sablayrolles, A., Douze, M., Ollivier, Y., Schmid, C., and Jegou, H. White-box vs black-box: Bayes optimal strategies for membership inference. In ICML, 2019.
- Shafahi, A., Huang, W. R., Najibi, M., Suciu, O., Studer, C., Dumitras, T., and Goldstein, T. Poison frogs! targeted clean-label poisoning attacks on neural networks. In Bengio, S., Wallach, H., Larochelle, H., Grauman, K., Cesa-Bianchi, N., and Garnett, R. (eds.), NeurIPS, 2018.
- Shokri, R., Stronati, M., and Shmatikov, V. Membership inference attacks against machine learning models. IEEE Symp. Security and Privacy, 2017.
- Steinhardt, J., Koh, P. W. W., and Liang, P. S. Certified defenses for data poisoning attacks. In NeurIPS. 2017.
- Szegedy, C., Zaremba, W., Sutskever, I., Bruna, J., Erhan, D., Goodfellow, I. J., and Fergus, R. Intriguing properties of neural networks. In ICLR, 2014.
- Thomee, B., Shamma, D. A., Friedland, G., Elizalde, B., Ni, K., Poland, D., Borth, D., and Li, L.-J. Yfcc100m: The new data in multimedia research. arXiv preprint arXiv:1503.01817, 2015.
- Tishby, N., Pereira, F. C., and Bialek, W. The information bottleneck method. arXiv preprint physics/0004057, 2000.
- Torralba, A., Efros, A. A., et al. Unbiased look at dataset bias. In CVPR, volume 1, pp. 7, 2011.
- Tran, B., Li, J., and Madry, A. Spectral signatures in backdoor attacks. In NeurIPS. 2018.
Full Text
Tags
Comments