Overlearning Reveals Sensitive Attributes
arXiv: Learning, 2019.
EI
Keywords:
Weibo:
Abstract:
``"Overlearning'' means that a model trained for a seemingly simple
objective implicitly learns to recognize attributes and concepts that are
(1) not part of the learning objective, and (2) sensitive from a privacy
or bias perspective. For example, a binary gender classifier of facial
images also learns to recogni...More
Introduction
- The authors demonstrate that representations learned by deep models when training for seemingly simple objectives reveal privacy- and bias-sensitive attributes that are not part of the specified objective.
- These unintentionally learned concepts are neither finer-, nor coarse-grained versions of the model’s labels, nor statistically correlated with them.
- The local part of the model computes a representation, censors it as described below, and sends it to the cloud part, which computes the model’s output
Highlights
- We demonstrate that representations learned by deep models when training for seemingly simple objectives reveal privacy- and bias-sensitive attributes that are not part of the specified objective
- To analyze where and why overlearning happens, we empirically show how general features emerge in the lower layers of models trained for simple objectives and conjecture an explanation based on the complexity of the training data
- We demonstrated that models trained for seemingly simple tasks implicitly learn concepts that are not represented in the objective function. They learn to recognize sensitive attributes, such as race and identity, that are statistically orthogonal to the objective
- The failure of censoring to suppress these attributes and the similarity of learned representations across uncorrelated tasks suggest that overlearning may be intrinsic, i.e., learning for some objectives may not be possible without recognizing generic low-level features that enable other tasks, including inference of sensitive attributes
- Regulators should focus on ensuring that models are applied in a way that respects privacy and fairness, while acknowledging that they may still recognize and use sensitive attributes
Results
- 4.1 DATASETS, TASKS, AND MODELS
Health is the Heritage Health dataset (Heritage Health Prize) with medical records of over 55,000 patients, binarized into 112 features with age information removed. - UTKFace is a set of over 23,000 face images labeled with age, gender, and race (UTKFace; Zhang et al, 2017).
- The task is to predict gender; the sensitive attribute is race.
- The task is to predict gender; the sensitive attribute is identity.
- When representations are censored with adversarial training, accuracy drops for both the main and inference tasks.
- Information-theoretical censoring reduces accuracy of inference, but damages main-task accuracy more than adversarial training for almost all models
Conclusion
- The authors demonstrated that models trained for seemingly simple tasks implicitly learn concepts that are not represented in the objective function
- They learn to recognize sensitive attributes, such as race and identity, that are statistically orthogonal to the objective.
- There may not exist a set of features that enables a model to accurately determine the gender of a face but not its race or identity
- This is a challenge for regulations such as GDPR that aim to control the purposes and uses of machine learning technologies.
Summary
Introduction:
The authors demonstrate that representations learned by deep models when training for seemingly simple objectives reveal privacy- and bias-sensitive attributes that are not part of the specified objective.- These unintentionally learned concepts are neither finer-, nor coarse-grained versions of the model’s labels, nor statistically correlated with them.
- The local part of the model computes a representation, censors it as described below, and sends it to the cloud part, which computes the model’s output
Results:
4.1 DATASETS, TASKS, AND MODELS
Health is the Heritage Health dataset (Heritage Health Prize) with medical records of over 55,000 patients, binarized into 112 features with age information removed.- UTKFace is a set of over 23,000 face images labeled with age, gender, and race (UTKFace; Zhang et al, 2017).
- The task is to predict gender; the sensitive attribute is race.
- The task is to predict gender; the sensitive attribute is identity.
- When representations are censored with adversarial training, accuracy drops for both the main and inference tasks.
- Information-theoretical censoring reduces accuracy of inference, but damages main-task accuracy more than adversarial training for almost all models
Conclusion:
The authors demonstrated that models trained for seemingly simple tasks implicitly learn concepts that are not represented in the objective function- They learn to recognize sensitive attributes, such as race and identity, that are statistically orthogonal to the objective.
- There may not exist a set of features that enables a model to accurately determine the gender of a face but not its race or identity
- This is a challenge for regulations such as GDPR that aim to control the purposes and uses of machine learning technologies.
Tables
- Table1: Summary of datasets and tasks. Cramer’s V captures statistical correlation between y and s (0 indicates no correlation and 1 indicates perfectly correlated)
- Table2: Accuracy of inference from representations (last FC layer). RAND is random guessing based on majority class labels; BASE is inference from the uncensored representation; ADV from the representation censored with adversarial training; IT from the information-theoretically censored representation
- Table3: Improving inference accuracy with de-censoring. δ is the increase from Table 2
- Table4: Adversarial re-purposing. The values are differences between the accuracy of predicting sensitive attributes using a re-purposed model vs. a model trained from scratch
- Table5: The effect of censoring on adversarial re-purposing for FaceScrub with γ = 0.5, 0.75, 1.0. δA is the difference in the original-task accuracy (second column) between uncensored and censored models; δB is the difference in the accuracy of inferring the sensitive attribute (columns 3 to 7) between the models re-purposed from different layers and the model trained from scratch. Negative values mean reduced accuracy. Heatmaps on the right are linear CKA similarities between censored and uncensored representations. Numbers 0 through 4 represent layers conv1, conv2, conv3, fc4, and fc5. For each model censored at layer i (x-axis), we measure similarity between the censored and uncensored models at layer j (y-axis). When censoring is applied to a specific layer, similarity for that layer is the smallest (values on the diagonal). When censoring lower layers with moderate strength (γ = 0.5 or 0.75), similarity between higher layers is still strong; when censoring higher layers, similarity between lower layers is strong. Therefore, censoring can block adversarial re-purposing from a specific layer, but the adversary can still re-purpose representations in the other layer(s) to obtain an accurate model for predicting sensitive attributes
Related work
- Prior work studied transferability of representations only between closely related tasks. Transferability of features between ImageNet models decreases as the distance between the base and target tasks grows (Yosinski et al, 2014), and performance of tasks is correlated to their distance from the source task (Azizpour et al, 2015). CNN models trained to distinguish coarse classes also distinguish their subsets (Huh et al, 2016). By contrast, we show that models trained for simple tasks implicitly learn privacy-sensitive concepts unrelated to the labels of the original task. Other than an anecdotal mention in the acknowledgments paragraph of (Kim et al, 2017) that logit-layer activations leak non-label concepts, this phenomenon has never been described in the research literature.
Gradient updates revealed by participants in distributed learning leak information about individual training batches that is uncorrelated with the learning objective (Melis et al, 2019). We show that overlearning is a generic problem in (fully trained) models, helping explain these observations.
Funding
- This research was supported in part by NSF grants 1611770, 1704296, and 1916717, the generosity of Eric and Wendy Schmidt by recommendation of the Schmidt Futures program, and a Google Faculty Research Award
Reference
- Alexander A. Alemi, Ian Fischer, Joshua V. Dillon, and Kevin Murphy. Deep variational information bottleneck. In ICLR, 2017.
- Hossein Azizpour, Ali Sharif Razavian, Josephine Sullivan, Atsuto Maki, and Stefan Carlsson. From generic to specific deep representations for visual recognition. In CVPR Workshops, 2015.
- Yoshua Bengio, Aaron Courville, and Pascal Vincent. Representation learning: A review and new perspectives. PAMI, 2013.
- Tian Qi Chen, Xuechen Li, Roger B Grosse, and David K Duvenaud. Isolating sources of disentanglement in variational autoencoders. In NIPS, 2018.
- Jianfeng Chi, Emmanuel Owusu, Xuwang Yin, Tong Yu, William Chan, Patrick Tague, and Yuan Tian. Privacy partitioning: Protecting user data during the deep learning inference phase. arXiv:1812.02863, 2018.
- Maximin Coavoux, Shashi Narayan, and Shay B. Cohen. Privacy-preserving neural representations of text. In EMNLP, 2018.
- Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. ImageNet: A large-scale hierarchical image database. In CVPR, 2009.
- Alexey Dosovitskiy and Thomas Brox. Generating images with perceptual similarity metrics based on deep networks. In NIPS, 2016.
- Harrison Edwards and Amos J. Storkey. Censoring representations with an adversary. In ICLR, 2016.
- Yanai Elazar and Yoav Goldberg. Adversarial removal of demographic attributes from text data. In EMNLP, 2018.
- EU. General Data Protection Regulation. https://en.wikipedia.org/wiki/General_ Data_Protection_Regulation, 2018.
- FaceScrub. http://vintage.winklerbros.net/facescrub.html, 2014.
- Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. Generative adversarial nets. In NIPS, 2014.
- Jihun Hamm. Minimax filter: Learning to preserve privacy from inference attacks. JMLR, 18(129): 1–31, 2017.
- Heritage Health Prize. https://www.kaggle.com/c/hhp, 2012.
- Irina Higgins, Loic Matthey, Arka Pal, Christopher Burgess, Xavier Glorot, Matthew Botvinick, Shakir Mohamed, and Alexander Lerchner. beta-VAE: Learning basic visual concepts with a constrained variational framework. In ICLR, 2017.
- Minyoung Huh, Pulkit Agrawal, and Alexei A Efros. What makes ImageNet good for transfer learning? arXiv:1608.08614, 2016.
- Yusuke Iwasawa, Kotaro Nakayama, Ikuko Yairi, and Yutaka Matsuo. Privacy issues regarding the application of DNNs to activity-recognition using wearables and its countermeasures by use of adversarial training. In IJCAI, 2016.
- Yiping Kang, Johann Hauswald, Cao Gao, Austin Rovinski, Trevor Mudge, Jason Mars, and Lingjia Tang. Neurosurgeon: Collaborative intelligence between the cloud and mobile edge. In ASPLOS, 2017.
- Been Kim, Martin Wattenberg, Justin Gilmer, Carrie Cai, James Wexler, Fernanda Viegas, and Rory Sayres. Interpretability beyond feature attribution: Quantitative testing with concept activation vectors (TCAV). arXiv:1711.11279, 2017.
- Hyunjik Kim and Andriy Mnih. Disentangling by factorising. In ICML, 2018.
- Yoon Kim. Convolutional neural networks for sentence classification. In EMNLP, 2014.
- Diederik P Kingma and Max Welling. Auto-encoding variational Bayes. arXiv:1312.6114, 2013.
- Simon Kornblith, Mohammad Norouzi, Honglak Lee, and Geoffrey Hinton. Similarity of neural network representations revisited. In ICML, 2019.
- Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. ImageNet classification with deep convolutional neural networks. In NIPS, 2012.
- Abhishek Kumar, Prasanna Sattigeri, and Avinash Balakrishnan. Variational inference of disentangled latent concepts from unlabeled observations. In ICLR, 2018.
- Nicholas D Lane and Petko Georgiev. Can deep learning revolutionize mobile sensing? In HotMobile, 2015.
- Yann LeCun, Leon Bottou, Yoshua Bengio, and Patrick Haffner. Gradient-based learning applied to document recognition. Proc. IEEE, 86(11):2278–2324, 1998.
- Meng Li, Liangzhen Lai, Naveen Suda, Vikas Chandra, and David Z Pan. PrivyNet: A flexible framework for privacy-preserving deep neural network training. arXiv:1709.06161, 2017.
- Yitong Li, Timothy Baldwin, and Trevor Cohn. Towards robust and privacy-preserving text representations. In ACL, 2018.
- Yuanzhi Li and Yingyu Liang. Learning overparameterized neural networks via stochastic gradient descent on structured data. In NIPS, 2018.
- Francesco Locatello, Stefan Bauer, Mario Lucic, Gunnar Raetsch, Sylvain Gelly, Bernhard Scholkopf, and Olivier Bachem. Challenging common assumptions in the unsupervised learning of disentangled representations. In ICML, 2019.
- Christos Louizos, Kevin Swersky, Yujia Li, Max Welling, and Richard Zemel. The variational fair autoencoder. In ICLR, 2016.
- David Madras, Elliot Creager, Toniann Pitassi, and Richard Zemel. Learning adversarially fair and transferable representations. In ICML, 2018.
- Luca Melis, Congzheng Song, Emiliano De Cristofaro, and Vitaly Shmatikov. Exploiting unintended feature leakage in collaborative learning. In S&P, 2019.
- Daniel Moyer, Shuyang Gao, Rob Brekelmans, Aram Galstyan, and Greg Ver Steeg. Invariant representations without adversarial training. In NIPS, 2018.
- Anh Nguyen, Alexey Dosovitskiy, Jason Yosinski, Thomas Brox, and Jeff Clune. Synthesizing the preferred inputs for neurons in neural networks via deep generator networks. In NIPS, 2016.
- Seyed Ali Osia, Ali Taheri, Ali Shahin Shamsabadi, Minos Katevas, Hamed Haddadi, and Hamid R. R. Rabiee. Deep private-feature extraction. TKDE, 2018.
- Piper project page. https://people.eecs.berkeley.edu/̃nzhang/piper.html, 2015.
- Francisco Rangel, Paolo Rosso, Ben Verhoeven, Walter Daelemans, Martin Potthast, and Benno Stein. Overview of the 4th author profiling task at PAN 2016: Cross-genre evaluations. In CEUR Workshop, 2016.
- Jiaming Song, Pratyusha Kalluri, Aditya Grover, Shengjia Zhao, and Stefano Ermon. Learning controllable fair representations. In AISTATS, 2019.
- UTKFace. http://aicip.eecs.utk.edu/wiki/UTKFace, 2017. Ji Wang, Jianguo Zhang, Weidong Bao, Xiaomin Zhu, Bokai Cao, and Philip S Yu. Not just privacy: Improving performance of private deep learning in mobile cloud. In KDD, 2018. Qizhe Xie, Zihang Dai, Yulun Du, Eduard H. Hovy, and Graham Neubig. Controllable invariance through adversarial feature learning. In NIPS, 2017. Yelp Open Dataset.https://www.yelp.com/dataset, 2018. Jason Yosinski, Jeff Clune, Yoshua Bengio, and Hod Lipson. How transferable are features in deep neural networks? In NIPS, 2014. Rich Zemel, Yu Wu, Kevin Swersky, Toni Pitassi, and Cynthia Dwork. Learning fair representations. In ICML, 2013. Ning Zhang, Manohar Paluri, Yaniv Taigman, Rob Fergus, and Lubomir Bourdev. Beyond frontal faces: Improving person recognition using multiple cues. In CVPR, 2015. Zhifei Zhang, Yang Song, and Hairong Qi. Age progression/regression by conditional adversarial autoencoder. In CVPR, 2017.
Full Text
Tags
Comments