Auditing Data Provenance in Text-Generation Models

KDD, pp. 196.0-206.0, 2019.

Cited by: 14|Bibtex|Views53|DOI:https://doi.org/10.1145/3292500.3330885
EI
Other Links: academic.microsoft.com|dl.acm.org|dblp.uni-trier.de|arxiv.org
Weibo:
We developed a black-box auditing method that enables users to check if their chats, messages, or comments have been used to train someone else’s model

Abstract:

To help enforce data-protection regulations such as GDPR and detect unauthorized uses of personal data, we develop a new model auditing technique that helps users check if their data was used to train a machine learning model. We focus on auditing deep-learning models that generate natural-language text, including word prediction and dial...More

Code:

Data:

0
Introduction
  • Data-protection policies and regulations such as the European Union’s General Data Protection Regulation (GDPR) [9] give users the right to know how their data is processed.
  • The authors consider scenarios where the model’s output is restricted to a relatively small list of words or even a single word.
  • This precludes the application of most previously proposed membership inference methods.
  • A deep learning model is a function fθ : X → Y parameterized by θ , where X is the input space and Y is the output space.
  • The output y can be either a class label, or a token, or a sequence of tokens
Highlights
  • Data-protection policies and regulations such as the European Union’s General Data Protection Regulation (GDPR) [9] give users the right to know how their data is processed
  • As machine learning (ML) becomes a core component of data processing in many offline and online services, and incidents such as DeepMind’s unauthorized use of NHS patients’ data to train ML models [3] illustrate the resulting privacy risks, it is essential to be able to audit the provenance of personal data used for model training
  • We quantitatively show that sequences that include relatively rare words are more effective for auditing than word sequences randomly selected from the user’s data
  • We show that text-generation models memorize even words and sentences that are directly related to their primary task and leverage this into an effective auditing method
  • Deep learning-based, text-generation models for word prediction, translation, and dialog generation are core components of many popular online services. We demonstrated that these models memorize their training data
  • We developed a black-box auditing method that enables users to check if their chats, messages, or comments have been used to train someone else’s model
Methods
  • 4.1 Datasets

    The Reddit comments dataset (Reddit) is a randomly chosen month (November 2017) from the public Reddit dataset.1 The authors filtered it to retain only the users with at least 150 but no more than 500 posts, for a total of 83,293 users with 247 posts each on average.
  • The Reddit comments dataset (Reddit) is a randomly chosen month (November 2017) from the public Reddit dataset.1.
  • The authors filtered it to retain only the users with at least 150 but no more than 500 posts, for a total of 83,293 users with 247 posts each on average.
  • The authors use the data from the en-fr pair for the machine translation task
Conclusion
  • Deep learning-based, text-generation models for word prediction, translation, and dialog generation are core components of many popular online services.
  • The authors demonstrated that these models memorize their training data.
  • More powerful auditing algorithms may be possible if the auditor has access to the model’s parameters and can observe its internal representations rather than just output predictions.
Summary
  • Introduction:

    Data-protection policies and regulations such as the European Union’s General Data Protection Regulation (GDPR) [9] give users the right to know how their data is processed.
  • The authors consider scenarios where the model’s output is restricted to a relatively small list of words or even a single word.
  • This precludes the application of most previously proposed membership inference methods.
  • A deep learning model is a function fθ : X → Y parameterized by θ , where X is the input space and Y is the output space.
  • The output y can be either a class label, or a token, or a sequence of tokens
  • Methods:

    4.1 Datasets

    The Reddit comments dataset (Reddit) is a randomly chosen month (November 2017) from the public Reddit dataset.1 The authors filtered it to retain only the users with at least 150 but no more than 500 posts, for a total of 83,293 users with 247 posts each on average.
  • The Reddit comments dataset (Reddit) is a randomly chosen month (November 2017) from the public Reddit dataset.1.
  • The authors filtered it to retain only the users with at least 150 but no more than 500 posts, for a total of 83,293 users with 247 posts each on average.
  • The authors use the data from the en-fr pair for the machine translation task
  • Conclusion:

    Deep learning-based, text-generation models for word prediction, translation, and dialog generation are core components of many popular online services.
  • The authors demonstrated that these models memorize their training data.
  • More powerful auditing algorithms may be possible if the auditor has access to the model’s parameters and can observe its internal representations rather than just output predictions.
Tables
  • Table1: Performance of target models. Acc is word prediction accuracy, perp is perplexity
  • Table2: Effect of training shadow models with different hyper-parameters than the target model
  • Table3: Effect of the model’s output size. | f (x)| is the number of words ranked by f
  • Table4: Examples of texts obfuscated using Google translation API and Yandex translation API
  • Table5: Audit performance on obfuscated Reddit comments
Download tables as Excel
Related work
  • Membership inference. Membership inference attacks involve observing the output of some computations over a hidden dataset D and determining whether a specific data point is a member of D. Membership inference attacks against aggregate statistics have been demonstrated in the context of genomic studies [13], location time-series [26], and noisy statistics in general [8].

    Shokri et al [28] develop black-box membership inference techniques against ML models which perform best when the target model is overfitted to the training data. Truex et al [32] extend and generalize this work to white-box and federated-learning settings. Rahman et al [27] use membership inference to evaluate the tradeoff between test accuracy and membership privacy in differentially private ML models. Hayes et al [11] study membership inference against generative models. Long et al [19] show that well-generalized models can leak membership information, but the adversary must first identify a handful of vulnerable records in the training dataset. Yeom et al [35] formalize membership inference and theoretically show that overfitting is sufficient but not necessary.
Funding
  • Supported in part by the NSF grants 1611770 and 1704296 and the generosity of Eric and Wendy Schmidt by recommendation of the Schmidt Futures program
Reference
  • M. Abadi et al. TensorFlow: A system for large-scale machine learning. In OSDI, 2016.
    Google ScholarLocate open access versionFindings
  • P. Adler et al. Auditing black-box models for indirect influence. KAIS, 54(1):95– 122, 2018.
    Google ScholarLocate open access versionFindings
  • BBC. Google DeepMind NHS app test broke UK privacy law. https://www.bbc.com/news/technology-40483202, 2017.
    Findings
  • M. Brennan, S. Afroz, and R. Greenstadt. Adversarial stylometry: Circumventing authorship recognition to preserve privacy and anonymity. TISSEC, 15(3):12, 2012.
    Google ScholarLocate open access versionFindings
  • N. Carlini et al. The Secret Sharer: Measuring unintended neural network memorization & extracting secrets. arXiv:1802.08232, 2018.
    Findings
  • K. Cho et al. Learning phrase representations using RNN encoder–decoder for statistical machine translation. In EMNLP, 2014.
    Google ScholarLocate open access versionFindings
  • C. Danescu-Niculescu-Mizil and L. Lee. Chameleons in imagined conversations: A new approach to understanding coordination of linguistic style in dialogs. In Workshop on Cognitive Modeling and Computational Linguistics, ACL, 2011.
    Google ScholarLocate open access versionFindings
  • C. Dwork et al. Robust traceability from trace amounts. In FOCS, 2015.
    Google ScholarLocate open access versionFindings
  • EU. General Data Protection Regulation. https://en.wikipedia.org/wiki/General_ Data_Protection_Regulation, 2018.
    Locate open access versionFindings
  • R.-E. Fan et al. LIBLINEAR: A library for large linear classification. JMLR, 9(Aug):1871–1874, 2008.
    Google ScholarLocate open access versionFindings
  • J. Hayes, L. Melis, G. Danezis, and E. De Cristofaro. LOGAN: Membership inference attacks against generative models. In PETS, 2019.
    Google ScholarLocate open access versionFindings
  • S. Hochreiter and J. Schmidhuber. Long short-term memory. Neural Computation, 9(8):1735–1780, 1997.
    Google ScholarLocate open access versionFindings
  • N. Homer et al. Resolving individuals contributing trace amounts of DNA to highly complex mixtures using high-density SNP genotyping microarrays. PLoS Genetics, 4(8):e1000167, 2008.
    Google ScholarLocate open access versionFindings
  • D. P. Kingma and J. Ba. Adam: A method for stochastic optimization. arXiv:1412.6980, 2014.
    Findings
  • P. Koehn. Europarl: A parallel corpus for statistical machine translation. In MT Summit, volume 5, 2005.
    Google ScholarLocate open access versionFindings
  • P. W. Koh and P. Liang. Understanding black-box predictions via influence functions. In ICML, 2017.
    Google ScholarLocate open access versionFindings
  • S. Kottur, X. Wang, and V. R. Carvalho. Exploring personalized neural conversational models. In IJCAI, 2017.
    Google ScholarLocate open access versionFindings
  • J. Li et al. A persona-based neural conversation model. In ACL, 2016.
    Google ScholarLocate open access versionFindings
  • Y. Long et al. Understanding membership inferences on well-generalized learning models. arXiv:1802.04889, 2018.
    Findings
  • R. Lowe, N. Pow, I. V. Serban, and J. Pineau. The Ubuntu dialogue corpus: A large dataset for research in unstructured multi-turn dialogue systems. In SIGDIAL, 2015.
    Google ScholarLocate open access versionFindings
  • T. Luong, M. Kayser, and C. D. Manning. Deep neural language models for machine translation. In CoNLL, 2015.
    Google ScholarLocate open access versionFindings
  • B. McMahan et al. Communication-efficient learning of deep networks from decentralized data. In AISTATS, 2017.
    Google ScholarLocate open access versionFindings
  • H. B. McMahan, D. Ramage, K. Talwar, and L. Zhang. Learning differentially private language models without losing accuracy. arXiv:1710.06963, 2017.
    Findings
  • P. Michel and G. Neubig. Extreme adaptation for personalized neural machine translation. arXiv:1805.01817, 2018.
    Findings
  • A. S. Morcos, D. G. Barrett, N. C. Rabinowitz, and M. Botvinick. On the importance of single directions for generalization. arXiv:1803.06959, 2018.
    Findings
  • A. Pyrgelis, C. Troncoso, and E. De Cristofaro. Knock knock, who’s there? Membership inference on aggregate location data. In NDSS, 2018.
    Google ScholarLocate open access versionFindings
  • M. A. Rahman et al. Membership inference attack against differentially private deep learning model. Transactions on Data Privacy, 11(1):61–79, 2018.
    Google ScholarLocate open access versionFindings
  • R. Shokri, M. Stronati, C. Song, and V. Shmatikov. Membership inference attacks against machine learning models. In S&P, 2017.
    Google ScholarLocate open access versionFindings
  • C. Song, T. Ristenpart, and V. Shmatikov. Machine learning models that remember too much. In CCS, 2017.
    Google ScholarLocate open access versionFindings
  • S. Tan, R. Caruana, G. Hooker, and Y. Lou. Detecting bias in black-box models using transparent model distillation. arXiv:1710.06169, 2017.
    Findings
  • F. Tramèr et al. FairTest: Discovering unwarranted associations in data-driven applications. In EuroS&P, 2017.
    Google ScholarLocate open access versionFindings
  • S. Truex et al. Towards demystifying membership inference attacks. arXiv:1807.09173, 2018.
    Findings
  • O. Vinyals and Q. Le. A neural conversational model. arXiv:1506.05869, 2015.
    Findings
  • Y. Wu et al. Google’s neural machine translation system: Bridging the gap between human and machine translation. arXiv:1609.08144, 2016.
    Findings
  • S. Yeom, I. Giacomelli, M. Fredrikson, and S. Jha. Privacy risk in machine learning: Analyzing the connection to overfitting. In CSF, 2018.
    Google ScholarLocate open access versionFindings
  • C. Zhang et al. Understanding deep learning requires rethinking generalization. In ICLR, 2017.
    Google ScholarLocate open access versionFindings
  • S. Zhang et al. Personalizing dialogue agents: I have a dog, do you have pets too? In ACL, 2018. 6 https://keras.io/7 http://scikit-learn.org/stable/modules/generated/sklearn.svm.LinearSVC.html
    Locate open access versionFindings
Full Text
Your rating :
0

 

Tags
Comments