Limits of Detecting Text Generated by Large-Scale Language Models

Varshney Lav R.
Varshney Lav R.
Keskar Nitish Shirish
Keskar Nitish Shirish

ITA, pp. 1-5, 2020.

Cited by: 2|Bibtex|Views62|DOI:https://doi.org/10.1109/ITA50056.2020.9245012
EI
Other Links: arxiv.org|dblp.uni-trier.de|academic.microsoft.com
Weibo:
Motivated by the problem of detecting machine-generated misinformation text that may have deleterious societal consequences, we have developed a formal hypothesis testing framework and established limits on the error exponents

Abstract:

Some consider large-scale language models that can generate long and coherent pieces of text as dangerous, since they may be used in misinformation campaigns. Here we formulate large-scale language model output detection as a hypothesis testing problem to classify text as genuine or generated. We show that error exponents for particular...More

Code:

Data:

Introduction
  • Building on a long history of language generation models that are based on statistical knowledge that people have [1]–[6], large-scale, neural network-based language models (LMs) that write paragraph-length text with the coherence of human writing have emerged [7]–[9].
  • The authors characterize the error exponent for a particular language model in terms of standard performance metrics such as cross-entropy and perplexity.
  • The authors consider not just a setting with a specific language model with given performance metrics, but rather consider a universal setting where the authors take a generic view of language models as empirical maximum likelihood k-order Markov approximations of stationary, ergodic random processes.
Highlights
  • Building on a long history of language generation models that are based on statistical knowledge that people have [1]–[6], large-scale, neural network-based language models (LMs) that write paragraph-length text with the coherence of human writing have emerged [7]–[9]
  • Motivated by the problem of detecting machine-generated misinformation text that may have deleterious societal consequences, we have developed a formal hypothesis testing framework and established limits on the error exponents
  • For the case of specific language models such as GPT-2 or CTRL, we provide a precise operational interpretation for the perplexity and cross-entropy
  • For any future large-scale language model, we conjecture a precise upper bound on the error exponent
  • As we had considered previously in the context of deepfake images [18], it is of interest to understand how error probability in detection parameterizes the dynamics of information spreading processes in social networks, e.g. in determining epidemic thresholds
Results
  • The error exponent is given by the asymptotic Kullback-Leibler divergence rate, defined as the almost-sure limit of: 1 n log
  • Suppose the authors are given a specific language model such as GPT-2 [7], GROVER [9], or CTRL [8], and it is characterized in terms of estimates of either cross-entropy H(P, Q) or perplexity PPL(P, Q).
  • Manning and Schutze argue that, even though not quite correct, language text can be modeled as stationary, ergodic random processes [30], an assumption that the authors follow.
  • Given the diversity of language production, the authors assume this stationary ergodic random process with finite alphabet A denoted X = {Xi, −∞ < i < ∞} is non-null in the sense that always P (x−−1m) > 0 and pm inf m≥1 min a∈A,x−1 ∈Am
  • The hypothesis test the authors aim to consider here is between a non-null, stationary, ergodic process with summable continuity rate and its empirical k-order Markov approximation based on training data.
  • The authors aim to bound the error exponent in hypothesis testing, by first drawing on a bound for the Ornstein d-distance between a stationary, ergodic process and its Markov approximation, due to Csiszar and Talata [31].
  • Theorem 1 ( [31]): Let X be a non-null stationary ergodic process with summable continuity rate.
  • If this generalized reverse Pinsker inequality holds, it implies the following further bound on the Kullback-Leibler divergence and the error exponent of the detection problem for the empirical maximum likelihood Markov language model.
Conclusion
  • Conjecture 2: Let X be a non-null stationary ergodic process with summable continuity rate defined on the finite alphabet A.
  • The authors have a precise asymptotic characterization of the error exponent in deciding between genuine text and text generated from the empirical maximum likelihood language model, expressed in terms of basic parameters of the language, and of the training data set.
  • Motivated by the problem of detecting machine-generated misinformation text that may have deleterious societal consequences, the authors have developed a formal hypothesis testing framework and established limits on the error exponents.
Summary
  • Building on a long history of language generation models that are based on statistical knowledge that people have [1]–[6], large-scale, neural network-based language models (LMs) that write paragraph-length text with the coherence of human writing have emerged [7]–[9].
  • The authors characterize the error exponent for a particular language model in terms of standard performance metrics such as cross-entropy and perplexity.
  • The authors consider not just a setting with a specific language model with given performance metrics, but rather consider a universal setting where the authors take a generic view of language models as empirical maximum likelihood k-order Markov approximations of stationary, ergodic random processes.
  • The error exponent is given by the asymptotic Kullback-Leibler divergence rate, defined as the almost-sure limit of: 1 n log
  • Suppose the authors are given a specific language model such as GPT-2 [7], GROVER [9], or CTRL [8], and it is characterized in terms of estimates of either cross-entropy H(P, Q) or perplexity PPL(P, Q).
  • Manning and Schutze argue that, even though not quite correct, language text can be modeled as stationary, ergodic random processes [30], an assumption that the authors follow.
  • Given the diversity of language production, the authors assume this stationary ergodic random process with finite alphabet A denoted X = {Xi, −∞ < i < ∞} is non-null in the sense that always P (x−−1m) > 0 and pm inf m≥1 min a∈A,x−1 ∈Am
  • The hypothesis test the authors aim to consider here is between a non-null, stationary, ergodic process with summable continuity rate and its empirical k-order Markov approximation based on training data.
  • The authors aim to bound the error exponent in hypothesis testing, by first drawing on a bound for the Ornstein d-distance between a stationary, ergodic process and its Markov approximation, due to Csiszar and Talata [31].
  • Theorem 1 ( [31]): Let X be a non-null stationary ergodic process with summable continuity rate.
  • If this generalized reverse Pinsker inequality holds, it implies the following further bound on the Kullback-Leibler divergence and the error exponent of the detection problem for the empirical maximum likelihood Markov language model.
  • Conjecture 2: Let X be a non-null stationary ergodic process with summable continuity rate defined on the finite alphabet A.
  • The authors have a precise asymptotic characterization of the error exponent in deciding between genuine text and text generated from the empirical maximum likelihood language model, expressed in terms of basic parameters of the language, and of the training data set.
  • Motivated by the problem of detecting machine-generated misinformation text that may have deleterious societal consequences, the authors have developed a formal hypothesis testing framework and established limits on the error exponents.
Reference
  • C. E. Shannon, “The redundancy of English,” in Transactions of the Seventh Conference on Cybernetics, Mar. 1950, pp. 123–158.
    Google ScholarLocate open access versionFindings
  • ——, “Prediction and entropy of printed English,” Bell System Technical Journal, vol. 30, no. 1, pp. 50–64, Jan. 1951.
    Google ScholarLocate open access versionFindings
  • A. Chapanis, “The reconstruction of abbreviated printed messages,” Journal of Experimental Psychology, vol. 48, no. 6, pp. 496–510, Dec. 1954.
    Google ScholarLocate open access versionFindings
  • D. Jamison and K. Jamison, “A note on the entropy of partial-known languages,” Information and Control, vol. 12, no. 2, pp. 164–167, Feb. 1968.
    Google ScholarLocate open access versionFindings
  • N. S. Tzannes, R. V. Spencer, and A. J. Kaplan, “On estimating the entropy of random fields,” Information and Control, vol. 16, no. 1, pp. 1–6, Mar. 1970.
    Google ScholarLocate open access versionFindings
  • T. M. Cover and R. C. King, “A convergent gambling estimate of the entropy of English,” IEEE Transactions on Information Theory, vol. IT24, no. 4, pp. 413–421, Jul. 1978.
    Google ScholarLocate open access versionFindings
  • A. Radford, J. Wu, R. Child, D. Luan, D. Amodei, and I. Sutskever, “Language models are unsupervised multitask learners,” 2019.
    Google ScholarFindings
  • N. S. Keskar, B. McCann, L. R. Varshney, C. Xiong, and R. Socher, “CTRL: A conditional transformer language model for controllable generation,” Sep. 2019, arXiv:1909.05858 [cs.CL].
    Findings
  • R. Zellers, A. Holtzman, H. Rashkin, Y. Bisk, A. Farhadi, F. Roesner, and Y. Choi, “Defending against neural fake news,” May 2019, arXiv:1905.12616 [cs.CL].
    Findings
  • I. Solaiman, M. Brundage, J. Clark, A. Askell, A. Herbert-Voss, J. Wu, A. Radford, G. Krueger, J. W. Kim, S. Kreps, M. McCain, A. Newhouse, J. Blazakis, K. McGuffie, and J. Wang, “Release strategies and the social impacts of language models,” Nov. 2019, arXiv:1908.09203v2 [cs.CL].
    Findings
  • L. R. Varshney, N. S. Keskar, and R. Socher, “Pretrained AI models: Performativity, mobility, and change,” Sep. 2019, arXiv:1909.03290 [cs.CY].
    Findings
  • J. Bullock and M. Luengo-Oroz, “Automated speech generation from UN general assembly statements: Mapping risks in AI generated texts,” Jun. 2019, arXiv:1906.01946 [cs.CL].
    Findings
  • A. Mitchell, J. Gottfried, G. Stocking, M. Walker, and S. Fedeli, Many Americans Say Made-Up News Is a Critical Problem That Needs To Be Fixed. Pew Research Center, Jun. 2019.
    Google ScholarLocate open access versionFindings
  • S. Gehrmann, H. Strobelt, and A. Rush, “GLTR: Statistical detection and visualization of generated text,” in Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics (ACL), Jul. 2019, pp. 111–116.
    Google ScholarLocate open access versionFindings
  • A. Bakhtin, S. Gross, M. Ott, Y. Deng, M. Ranzato, and A. Szlam, “Real or fake? learning to discriminate machine from human generated text,” Jul. 2019, arXiv:1906.03351 [cs.LG].
    Findings
  • T. Schuster, R. Schuster, D. J. Shah, and R. Barzilay, “Are we safe yet? the limitations of distributional features for fake news detection,” Aug. 2019, arXiv:1908.09805 [cs.CL].
    Findings
  • D. Ippolito, D. Duckworth, C. Callison-Burch, and D. Eck, “Human and automatic detection of generated text,” Nov. 2019, arXiv:1911.00650 [cs.CL].
    Findings
  • S. Agarwal and L. R. Varshney, “Limits of deepfake detection: A robust estimation viewpoint,” in Proceedings of Synthetic Realities: Deep Learning for Detecting AudioVisual Fakes Workshop at ICML 2019, Jun. 2019.
    Google ScholarLocate open access versionFindings
  • U. M. Maurer, “Authentication theory and hypothesis testing,” IEEE Transactions on Information Theory, vol. 46, no. 4, pp. 1350–1356, Jul. 2000.
    Google ScholarLocate open access versionFindings
  • J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, “BERT: Pretraining of deep bidirectional transformers for language understanding,” in Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (NAACL), Jun. 2019, pp. 4171–4186.
    Google ScholarLocate open access versionFindings
  • A. Wang and K. Cho, “BERT has a mouth, and it must speak: BERT as a Markov random field language model,” Feb. 2019, arXiv:1902.04094 [cs.CL].
    Findings
  • T. B. Hashimoto, H. Zhang, and P. Liang, “Unifying human and statistical evaluation for natural language generation,” in Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (NAACL), Jun. 2019, pp. 1689–1701.
    Google ScholarLocate open access versionFindings
  • A. Holtzman, J. Buys, M. Forbes, and Y. Choi, “The curious case of neural text degeneration,” Apr. 2019, arXiv:1904.09751 [cs.CL].
    Findings
  • S. Merity, C. Xiong, J. Bradbury, and R. Socher, “Pointer sentinel mixture models,” Sep. 2016, arXiv:1609.07843 [cs.CL].
    Findings
  • C. Coupe, Y. M. Oh, D. Dediu, and F. Pellegrino, “Different languages, similar encoding efficiency: Comparable information rates across the human communicative niche,” Science Advances, vol. 5, no. 9, p. eaaw2594, Sep. 2019.
    Google ScholarLocate open access versionFindings
  • E. N. Gilbert, “Codes based on inaccurate source probabilities,” IEEE Transactions on Information Theory, vol. IT-17, no. 3, pp. 304–314, May 1971.
    Google ScholarLocate open access versionFindings
  • T. M. Cover and J. A. Thomas, Elements of Information Theory. New York: John Wiley & Sons, 1991.
    Google ScholarFindings
  • Y. Sung, L. Tong, and H. V. Poor, “Neyman-Pearson detection of GaussMarkov signals in noise: Closed-form error exponent and properties,” IEEE Transactions on Information Theory, vol. 52, no. 4, pp. 1354– 1365, Apr. 2006.
    Google ScholarLocate open access versionFindings
  • H. Luschgy, A. L. Rukhin, and I. Vajda, “Adaptive tests for stochastic processes in the ergodic case,” Stochastic Processes and their Applications, vol. 45, no. 1, pp. 45–59, Mar. 1993.
    Google ScholarLocate open access versionFindings
  • C. D. Manning and H. Schutze, Foundations of Statistical Natural Language Processing. Cambridge, MA, USA: MIT Press, 1999.
    Google ScholarFindings
  • I. Csiszar and Z. Talata, “On rate of convergence of statistical estimation of stationary ergodic processes,” IEEE Transactions on Information Theory, vol. 56, no. 8, pp. 3637–3641, Aug. 2010.
    Google ScholarLocate open access versionFindings
  • C. Chelba, M. Norouzi, and S. Bengio, “N -gram language modeling using recurrent neural network estimation,” Mar. 2017, arXiv:1703.10724 [cs.CL].
    Findings
  • I. Sason and S. Verdu, “f -divergence inequalities,” IEEE Transactions on Information Theory, vol. 62, no. 11, pp. 5973–6006, Nov. 2016.
    Google ScholarLocate open access versionFindings
  • O. Binette, “A note on reverse Pinsker inequalities,” IEEE Transactions on Information Theory, vol. 65, no. 7, pp. 4094–4096, Jul. 2019.
    Google ScholarLocate open access versionFindings
  • K. Marton, “Bounding d-distance by informational divergence: a method to prove measure concentration,” Annals of Probability, vol. 24, no. 2, pp. 857–866, Apr. 1996.
    Google ScholarLocate open access versionFindings
  • ——, “Measure concentration for a class of random processes,” Probability Theory and Related Fields, vol. 110, pp. 427–439, Mar. 1998.
    Google ScholarLocate open access versionFindings
  • D. Harwell, “Top AI researchers race to detect ‘deepfake’ videos: ‘We are outgunned’,” The Washington Post, Jun. 2019.
    Google ScholarLocate open access versionFindings
Full Text
Your rating :
0

 

Tags
Comments