Overlearning Reveals Sensitive Attributes

arXiv: Learning, 2019.

Cited by: 4|Bibtex|Views39
EI
Other Links: dblp.uni-trier.de|academic.microsoft.com|arxiv.org
Weibo:
We demonstrate that representations learned by deep models when training for seemingly simple objectives reveal privacy- and bias-sensitive attributes that are not part of the specified objective

Abstract:

``"Overlearning'' means that a model trained for a seemingly simple objective implicitly learns to recognize attributes and concepts that are (1) not part of the learning objective, and (2) sensitive from a privacy or bias perspective. For example, a binary gender classifier of facial images also learns to recogni...More
Introduction
  • The authors demonstrate that representations learned by deep models when training for seemingly simple objectives reveal privacy- and bias-sensitive attributes that are not part of the specified objective.
  • These unintentionally learned concepts are neither finer-, nor coarse-grained versions of the model’s labels, nor statistically correlated with them.
  • The local part of the model computes a representation, censors it as described below, and sends it to the cloud part, which computes the model’s output
Highlights
  • We demonstrate that representations learned by deep models when training for seemingly simple objectives reveal privacy- and bias-sensitive attributes that are not part of the specified objective
  • To analyze where and why overlearning happens, we empirically show how general features emerge in the lower layers of models trained for simple objectives and conjecture an explanation based on the complexity of the training data
  • We demonstrated that models trained for seemingly simple tasks implicitly learn concepts that are not represented in the objective function. They learn to recognize sensitive attributes, such as race and identity, that are statistically orthogonal to the objective
  • The failure of censoring to suppress these attributes and the similarity of learned representations across uncorrelated tasks suggest that overlearning may be intrinsic, i.e., learning for some objectives may not be possible without recognizing generic low-level features that enable other tasks, including inference of sensitive attributes
  • Regulators should focus on ensuring that models are applied in a way that respects privacy and fairness, while acknowledging that they may still recognize and use sensitive attributes
Results
  • 4.1 DATASETS, TASKS, AND MODELS

    Health is the Heritage Health dataset (Heritage Health Prize) with medical records of over 55,000 patients, binarized into 112 features with age information removed.
  • UTKFace is a set of over 23,000 face images labeled with age, gender, and race (UTKFace; Zhang et al, 2017).
  • The task is to predict gender; the sensitive attribute is race.
  • The task is to predict gender; the sensitive attribute is identity.
  • When representations are censored with adversarial training, accuracy drops for both the main and inference tasks.
  • Information-theoretical censoring reduces accuracy of inference, but damages main-task accuracy more than adversarial training for almost all models
Conclusion
  • The authors demonstrated that models trained for seemingly simple tasks implicitly learn concepts that are not represented in the objective function
  • They learn to recognize sensitive attributes, such as race and identity, that are statistically orthogonal to the objective.
  • There may not exist a set of features that enables a model to accurately determine the gender of a face but not its race or identity
  • This is a challenge for regulations such as GDPR that aim to control the purposes and uses of machine learning technologies.
Summary
  • Introduction:

    The authors demonstrate that representations learned by deep models when training for seemingly simple objectives reveal privacy- and bias-sensitive attributes that are not part of the specified objective.
  • These unintentionally learned concepts are neither finer-, nor coarse-grained versions of the model’s labels, nor statistically correlated with them.
  • The local part of the model computes a representation, censors it as described below, and sends it to the cloud part, which computes the model’s output
  • Results:

    4.1 DATASETS, TASKS, AND MODELS

    Health is the Heritage Health dataset (Heritage Health Prize) with medical records of over 55,000 patients, binarized into 112 features with age information removed.
  • UTKFace is a set of over 23,000 face images labeled with age, gender, and race (UTKFace; Zhang et al, 2017).
  • The task is to predict gender; the sensitive attribute is race.
  • The task is to predict gender; the sensitive attribute is identity.
  • When representations are censored with adversarial training, accuracy drops for both the main and inference tasks.
  • Information-theoretical censoring reduces accuracy of inference, but damages main-task accuracy more than adversarial training for almost all models
  • Conclusion:

    The authors demonstrated that models trained for seemingly simple tasks implicitly learn concepts that are not represented in the objective function
  • They learn to recognize sensitive attributes, such as race and identity, that are statistically orthogonal to the objective.
  • There may not exist a set of features that enables a model to accurately determine the gender of a face but not its race or identity
  • This is a challenge for regulations such as GDPR that aim to control the purposes and uses of machine learning technologies.
Tables
  • Table1: Summary of datasets and tasks. Cramer’s V captures statistical correlation between y and s (0 indicates no correlation and 1 indicates perfectly correlated)
  • Table2: Accuracy of inference from representations (last FC layer). RAND is random guessing based on majority class labels; BASE is inference from the uncensored representation; ADV from the representation censored with adversarial training; IT from the information-theoretically censored representation
  • Table3: Improving inference accuracy with de-censoring. δ is the increase from Table 2
  • Table4: Adversarial re-purposing. The values are differences between the accuracy of predicting sensitive attributes using a re-purposed model vs. a model trained from scratch
  • Table5: The effect of censoring on adversarial re-purposing for FaceScrub with γ = 0.5, 0.75, 1.0. δA is the difference in the original-task accuracy (second column) between uncensored and censored models; δB is the difference in the accuracy of inferring the sensitive attribute (columns 3 to 7) between the models re-purposed from different layers and the model trained from scratch. Negative values mean reduced accuracy. Heatmaps on the right are linear CKA similarities between censored and uncensored representations. Numbers 0 through 4 represent layers conv1, conv2, conv3, fc4, and fc5. For each model censored at layer i (x-axis), we measure similarity between the censored and uncensored models at layer j (y-axis). When censoring is applied to a specific layer, similarity for that layer is the smallest (values on the diagonal). When censoring lower layers with moderate strength (γ = 0.5 or 0.75), similarity between higher layers is still strong; when censoring higher layers, similarity between lower layers is strong. Therefore, censoring can block adversarial re-purposing from a specific layer, but the adversary can still re-purpose representations in the other layer(s) to obtain an accurate model for predicting sensitive attributes
Download tables as Excel
Related work
  • Prior work studied transferability of representations only between closely related tasks. Transferability of features between ImageNet models decreases as the distance between the base and target tasks grows (Yosinski et al, 2014), and performance of tasks is correlated to their distance from the source task (Azizpour et al, 2015). CNN models trained to distinguish coarse classes also distinguish their subsets (Huh et al, 2016). By contrast, we show that models trained for simple tasks implicitly learn privacy-sensitive concepts unrelated to the labels of the original task. Other than an anecdotal mention in the acknowledgments paragraph of (Kim et al, 2017) that logit-layer activations leak non-label concepts, this phenomenon has never been described in the research literature.

    Gradient updates revealed by participants in distributed learning leak information about individual training batches that is uncorrelated with the learning objective (Melis et al, 2019). We show that overlearning is a generic problem in (fully trained) models, helping explain these observations.
Funding
  • This research was supported in part by NSF grants 1611770, 1704296, and 1916717, the generosity of Eric and Wendy Schmidt by recommendation of the Schmidt Futures program, and a Google Faculty Research Award
Reference
  • Alexander A. Alemi, Ian Fischer, Joshua V. Dillon, and Kevin Murphy. Deep variational information bottleneck. In ICLR, 2017.
    Google ScholarLocate open access versionFindings
  • Hossein Azizpour, Ali Sharif Razavian, Josephine Sullivan, Atsuto Maki, and Stefan Carlsson. From generic to specific deep representations for visual recognition. In CVPR Workshops, 2015.
    Google ScholarLocate open access versionFindings
  • Yoshua Bengio, Aaron Courville, and Pascal Vincent. Representation learning: A review and new perspectives. PAMI, 2013.
    Google ScholarLocate open access versionFindings
  • Tian Qi Chen, Xuechen Li, Roger B Grosse, and David K Duvenaud. Isolating sources of disentanglement in variational autoencoders. In NIPS, 2018.
    Google ScholarLocate open access versionFindings
  • Jianfeng Chi, Emmanuel Owusu, Xuwang Yin, Tong Yu, William Chan, Patrick Tague, and Yuan Tian. Privacy partitioning: Protecting user data during the deep learning inference phase. arXiv:1812.02863, 2018.
    Findings
  • Maximin Coavoux, Shashi Narayan, and Shay B. Cohen. Privacy-preserving neural representations of text. In EMNLP, 2018.
    Google ScholarLocate open access versionFindings
  • Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. ImageNet: A large-scale hierarchical image database. In CVPR, 2009.
    Google ScholarLocate open access versionFindings
  • Alexey Dosovitskiy and Thomas Brox. Generating images with perceptual similarity metrics based on deep networks. In NIPS, 2016.
    Google ScholarLocate open access versionFindings
  • Harrison Edwards and Amos J. Storkey. Censoring representations with an adversary. In ICLR, 2016.
    Google ScholarLocate open access versionFindings
  • Yanai Elazar and Yoav Goldberg. Adversarial removal of demographic attributes from text data. In EMNLP, 2018.
    Google ScholarFindings
  • EU. General Data Protection Regulation. https://en.wikipedia.org/wiki/General_ Data_Protection_Regulation, 2018.
    Locate open access versionFindings
  • FaceScrub. http://vintage.winklerbros.net/facescrub.html, 2014.
    Findings
  • Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. Generative adversarial nets. In NIPS, 2014.
    Google ScholarLocate open access versionFindings
  • Jihun Hamm. Minimax filter: Learning to preserve privacy from inference attacks. JMLR, 18(129): 1–31, 2017.
    Google ScholarLocate open access versionFindings
  • Heritage Health Prize. https://www.kaggle.com/c/hhp, 2012.
    Findings
  • Irina Higgins, Loic Matthey, Arka Pal, Christopher Burgess, Xavier Glorot, Matthew Botvinick, Shakir Mohamed, and Alexander Lerchner. beta-VAE: Learning basic visual concepts with a constrained variational framework. In ICLR, 2017.
    Google ScholarLocate open access versionFindings
  • Minyoung Huh, Pulkit Agrawal, and Alexei A Efros. What makes ImageNet good for transfer learning? arXiv:1608.08614, 2016.
    Findings
  • Yusuke Iwasawa, Kotaro Nakayama, Ikuko Yairi, and Yutaka Matsuo. Privacy issues regarding the application of DNNs to activity-recognition using wearables and its countermeasures by use of adversarial training. In IJCAI, 2016.
    Google ScholarLocate open access versionFindings
  • Yiping Kang, Johann Hauswald, Cao Gao, Austin Rovinski, Trevor Mudge, Jason Mars, and Lingjia Tang. Neurosurgeon: Collaborative intelligence between the cloud and mobile edge. In ASPLOS, 2017.
    Google ScholarLocate open access versionFindings
  • Been Kim, Martin Wattenberg, Justin Gilmer, Carrie Cai, James Wexler, Fernanda Viegas, and Rory Sayres. Interpretability beyond feature attribution: Quantitative testing with concept activation vectors (TCAV). arXiv:1711.11279, 2017.
    Findings
  • Hyunjik Kim and Andriy Mnih. Disentangling by factorising. In ICML, 2018.
    Google ScholarLocate open access versionFindings
  • Yoon Kim. Convolutional neural networks for sentence classification. In EMNLP, 2014.
    Google ScholarLocate open access versionFindings
  • Diederik P Kingma and Max Welling. Auto-encoding variational Bayes. arXiv:1312.6114, 2013.
    Findings
  • Simon Kornblith, Mohammad Norouzi, Honglak Lee, and Geoffrey Hinton. Similarity of neural network representations revisited. In ICML, 2019.
    Google ScholarLocate open access versionFindings
  • Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. ImageNet classification with deep convolutional neural networks. In NIPS, 2012.
    Google ScholarLocate open access versionFindings
  • Abhishek Kumar, Prasanna Sattigeri, and Avinash Balakrishnan. Variational inference of disentangled latent concepts from unlabeled observations. In ICLR, 2018.
    Google ScholarLocate open access versionFindings
  • Nicholas D Lane and Petko Georgiev. Can deep learning revolutionize mobile sensing? In HotMobile, 2015.
    Google ScholarLocate open access versionFindings
  • Yann LeCun, Leon Bottou, Yoshua Bengio, and Patrick Haffner. Gradient-based learning applied to document recognition. Proc. IEEE, 86(11):2278–2324, 1998.
    Google ScholarLocate open access versionFindings
  • Meng Li, Liangzhen Lai, Naveen Suda, Vikas Chandra, and David Z Pan. PrivyNet: A flexible framework for privacy-preserving deep neural network training. arXiv:1709.06161, 2017.
    Findings
  • Yitong Li, Timothy Baldwin, and Trevor Cohn. Towards robust and privacy-preserving text representations. In ACL, 2018.
    Google ScholarLocate open access versionFindings
  • Yuanzhi Li and Yingyu Liang. Learning overparameterized neural networks via stochastic gradient descent on structured data. In NIPS, 2018.
    Google ScholarLocate open access versionFindings
  • Francesco Locatello, Stefan Bauer, Mario Lucic, Gunnar Raetsch, Sylvain Gelly, Bernhard Scholkopf, and Olivier Bachem. Challenging common assumptions in the unsupervised learning of disentangled representations. In ICML, 2019.
    Google ScholarLocate open access versionFindings
  • Christos Louizos, Kevin Swersky, Yujia Li, Max Welling, and Richard Zemel. The variational fair autoencoder. In ICLR, 2016.
    Google ScholarLocate open access versionFindings
  • David Madras, Elliot Creager, Toniann Pitassi, and Richard Zemel. Learning adversarially fair and transferable representations. In ICML, 2018.
    Google ScholarLocate open access versionFindings
  • Luca Melis, Congzheng Song, Emiliano De Cristofaro, and Vitaly Shmatikov. Exploiting unintended feature leakage in collaborative learning. In S&P, 2019.
    Google ScholarLocate open access versionFindings
  • Daniel Moyer, Shuyang Gao, Rob Brekelmans, Aram Galstyan, and Greg Ver Steeg. Invariant representations without adversarial training. In NIPS, 2018.
    Google ScholarLocate open access versionFindings
  • Anh Nguyen, Alexey Dosovitskiy, Jason Yosinski, Thomas Brox, and Jeff Clune. Synthesizing the preferred inputs for neurons in neural networks via deep generator networks. In NIPS, 2016.
    Google ScholarLocate open access versionFindings
  • Seyed Ali Osia, Ali Taheri, Ali Shahin Shamsabadi, Minos Katevas, Hamed Haddadi, and Hamid R. R. Rabiee. Deep private-feature extraction. TKDE, 2018.
    Google ScholarLocate open access versionFindings
  • Piper project page. https://people.eecs.berkeley.edu/̃nzhang/piper.html, 2015.
    Findings
  • Francisco Rangel, Paolo Rosso, Ben Verhoeven, Walter Daelemans, Martin Potthast, and Benno Stein. Overview of the 4th author profiling task at PAN 2016: Cross-genre evaluations. In CEUR Workshop, 2016.
    Google ScholarLocate open access versionFindings
  • Jiaming Song, Pratyusha Kalluri, Aditya Grover, Shengjia Zhao, and Stefano Ermon. Learning controllable fair representations. In AISTATS, 2019.
    Google ScholarLocate open access versionFindings
  • UTKFace. http://aicip.eecs.utk.edu/wiki/UTKFace, 2017. Ji Wang, Jianguo Zhang, Weidong Bao, Xiaomin Zhu, Bokai Cao, and Philip S Yu. Not just privacy: Improving performance of private deep learning in mobile cloud. In KDD, 2018. Qizhe Xie, Zihang Dai, Yulun Du, Eduard H. Hovy, and Graham Neubig. Controllable invariance through adversarial feature learning. In NIPS, 2017. Yelp Open Dataset.https://www.yelp.com/dataset, 2018. Jason Yosinski, Jeff Clune, Yoshua Bengio, and Hod Lipson. How transferable are features in deep neural networks? In NIPS, 2014. Rich Zemel, Yu Wu, Kevin Swersky, Toni Pitassi, and Cynthia Dwork. Learning fair representations. In ICML, 2013. Ning Zhang, Manohar Paluri, Yaniv Taigman, Rob Fergus, and Lubomir Bourdev. Beyond frontal faces: Improving person recognition using multiple cues. In CVPR, 2015. Zhifei Zhang, Yang Song, and Hairong Qi. Age progression/regression by conditional adversarial autoencoder. In CVPR, 2017.
    Locate open access versionFindings
Full Text
Your rating :
0

 

Tags
Comments