AI helps you reading Science

AI generates interpretation videos

AI extracts and analyses the key points of the paper to generate videos automatically


pub
Go Generating

AI Traceability

AI parses the academic lineage of this thesis


Master Reading Tree
Generate MRT

AI Insight

AI extracts a summary of this paper


Weibo:
We present a new dataset consisting of intent-parallel American Vernacular English/Standard American English tweet pairs, which can be used in future works studying SAE and AAVE

Investigating African American Vernacular English in Transformer Based Text Generation

EMNLP 2020, (2020)

Cited by: 0|Views28
Full Text
Bibtex
Weibo

Abstract

The growth of social media has encouraged the written use of African American Vernacular English (AAVE), which has traditionally been used only in oral contexts. However, NLP models have historically been developed using dominant English varieties, such as Standard American English (SAE), due to text corpora availability. We investigate t...More

Code:

Data:

0
Introduction
  • African American Vernacular English (AAVE) is a sociolinguistic variety of American English distinct from Standard American English (SAE) with unique syntactic, semantic, and lexical patterns (Green, 2002; Jones, 2015).
  • AAVE has historically been used in spoken contexts, the growing use of social media has encouraged AAVE in written media for which NLP models are increasingly being used.
  • GPT-2 displays bias towards particular social groups (Solaiman et al, 2019).
  • Coupled with concerns that NLG tools can be used for generating fake news (Gehrmann et al, 2019) or impersonating internet users (Zellers et al, 2019), it is important that current work investigates the contexts in which NLG models display bias against certain demographics
Highlights
  • African American Vernacular English (AAVE) is a sociolinguistic variety of American English distinct from Standard American English (SAE) with unique syntactic, semantic, and lexical patterns (Green, 2002; Jones, 2015)
  • Coupled with concerns that Natural Language Generation (NLG) tools can be used for generating fake news (Gehrmann et al, 2019) or impersonating internet users (Zellers et al, 2019), it is important that current work investigates the contexts in which NLG models display bias against certain demographics
  • We provide a new evaluation of NLG models by comparing GPT-2’s behavior on SAE and AAVE
  • We present a new dataset consisting of intent-parallel AAVE/SAE tweet pairs, which can be used in future works studying SAE and AAVE
  • Our sentiment analysis experiments indicate that GPT-2 produces more negative instances when prompted with AAVE text
  • We hope our findings can pave the way for further inclusion of diverse language in future NLG models
Results
  • Annotators were filtered by HIT approval rate and location. For each partition of the data (both SAE and AAVE with and without generation by GPT-2), the sample variance is under 0.02%.
  • VADER and Textblob for both AAVE and SAE had p < 0.01.
  • DistilBERT for AAVE had p = 0.012 and DistilBERT for SAE had p = 0.11.
  • For the VADER-Textblob average in Table 1, AAVE generated segments are 50.38% less neutral than their original second segments, and SAE generated segments are 46.8% less neutral.
  • Looking at 3, the proportion of annotators who select SAE as machine generated decreases to 37.3%, whereas the proportion for AAVE increases to 42.1%
Conclusion
  • The authors highlight the need for AAVE-inclusivity in NLG models, especially those perceived as state-of-the-art.
  • To this end, the authors provide a new evaluation of NLG models by comparing GPT-2’s behavior on SAE and AAVE.
  • The authors' BLEU, ROUGE, and human evaluation results reveal a disparity in the quality of GPT-2’s text generation between AAVE and SAE.
  • The authors hope the findings can pave the way for further inclusion of diverse language in future NLG models
Summary
  • Introduction:

    African American Vernacular English (AAVE) is a sociolinguistic variety of American English distinct from Standard American English (SAE) with unique syntactic, semantic, and lexical patterns (Green, 2002; Jones, 2015).
  • AAVE has historically been used in spoken contexts, the growing use of social media has encouraged AAVE in written media for which NLP models are increasingly being used.
  • GPT-2 displays bias towards particular social groups (Solaiman et al, 2019).
  • Coupled with concerns that NLG tools can be used for generating fake news (Gehrmann et al, 2019) or impersonating internet users (Zellers et al, 2019), it is important that current work investigates the contexts in which NLG models display bias against certain demographics
  • Results:

    Annotators were filtered by HIT approval rate and location. For each partition of the data (both SAE and AAVE with and without generation by GPT-2), the sample variance is under 0.02%.
  • VADER and Textblob for both AAVE and SAE had p < 0.01.
  • DistilBERT for AAVE had p = 0.012 and DistilBERT for SAE had p = 0.11.
  • For the VADER-Textblob average in Table 1, AAVE generated segments are 50.38% less neutral than their original second segments, and SAE generated segments are 46.8% less neutral.
  • Looking at 3, the proportion of annotators who select SAE as machine generated decreases to 37.3%, whereas the proportion for AAVE increases to 42.1%
  • Conclusion:

    The authors highlight the need for AAVE-inclusivity in NLG models, especially those perceived as state-of-the-art.
  • To this end, the authors provide a new evaluation of NLG models by comparing GPT-2’s behavior on SAE and AAVE.
  • The authors' BLEU, ROUGE, and human evaluation results reveal a disparity in the quality of GPT-2’s text generation between AAVE and SAE.
  • The authors hope the findings can pave the way for further inclusion of diverse language in future NLG models
Tables
  • Table1: Sentiment scores and averages for the SAE and AAVE samples in our dataset, using pretrained DistilBERT, VADER, and TextBlob sentiment classifiers
  • Table2: Wilcoxon rank-sum test p-values for each of our BLEU (B) and ROUGE (R) results. P-values that are significant with α = 0.05 are in bold
  • Table3: Human evaluation results, where “MG” refers to “Machine Generated.” Tests are conducted pairwise between generated SAE and AAVE phrases
Download tables as Excel
Funding
  • This material is based upon work supported in part by the National Science Foundation under Grant 1821415
Reference
  • Su Lin Blodgett, Lisa Green, and Brendan O’Connor. 2016. Demographic dialectal variation in social media: A case study of African-American English. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, pages 1119–1130, Austin, Texas. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Rachel Dorn. 2019. Dialect-specific models for automatic speech recognition of African American vernacular English. In Proceedings of the Student Research Workshop Associated with RANLP 2019, pages 16–20, Varna, Bulgaria. INCOMA Ltd.
    Google ScholarLocate open access versionFindings
  • Sebastian Gehrmann, Hendrik Strobelt, and Alexander Rush. 2019. GLTR: Statistical detection and visualization of generated text. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics: System Demonstrations, pages 111–116, Florence, Italy. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Lisa Green. 2002. African American English: A Linguistic Introduction. University Press, Cambridge.
    Google ScholarFindings
  • C.J. Hutto and Eric Gilbert. 201Vader: A parsimonious rule-based model for sentiment analysis of social media text.
    Google ScholarFindings
  • Taylor Jones. 2015. Toward a description of african american vernacular english dialect regions using “black twitter”. American Speech, 90:403–440.
    Google ScholarLocate open access versionFindings
  • Taylor Jones, Jessica Rose Kalbfeld, Ryan Hancock, and Ryan Clark. 2019. Testifying while black: An experimental study of court reporter accuracy in transcription of african american english. Language, 95(2):e216–e252.
    Google ScholarLocate open access versionFindings
  • Anna Jørgensen, Dirk Hovy, and Anders Søgaard. 2016. Learning a POS tagger for AAVE-like language. In Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 1115–1120, San Diego, California. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Guillaume Lample, Alexis Conneau, Marc’Aurelio Ranzato, Ludovic Denoyer, and Herv Jgou. 2018. Word translation without parallel data. In ICLR.
    Google ScholarFindings
  • Ian Stewart. 2014. Now we stronger than ever: AfricanAmerican English syntax in twitter. In Proceedings of the Student Research Workshop at the 14th Conference of the European Chapter of the Association for Computational Linguistics, pages 31– 37, Gothenburg, Sweden. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Rowan Zellers, Ari Holtzman, Hannah Rashkin, Yonatan Bisk, Ali Farhadi, Franziska Roesner, and Yejin Choi. 2019. Defending against neural fake news. In H. Wallach, H. Larochelle, A. Beygelzimer, F. dAlche-Buc, E. Fox, and R. Garnett, editors, Advances in Neural Information Processing Systems 32, pages 9054–9065. Curran Associates, Inc.
    Google ScholarLocate open access versionFindings
  • Alec Radford, Jeff Wu, Rewon Child, David Luan, Dario Amodei, and Ilya Sutskever. 2019. Language models are unsupervised multitask learners.
    Google ScholarFindings
  • Victor Sanh, Lysandre Debut, Julien Chaumond, and Thomas Wolf. 2019.
    Google ScholarFindings
  • Maarten Sap, Dallas Card, Saadia Gabriel, Yejin Choi, and Noah A. Smith. 2019. The risk of racial bias in hate speech detection. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 1668–1678, Florence, Italy. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • J. H. Shen, L. Fratamico, I. Rahwan, and A. M. Rush. 2018. Darling or babygirl? Investigating stylistic bias in sentiment analysis. In Proceedings of the 5th Workshop on Fairness, Accountability, and Transparency in Machine Learning (FATML).
    Google ScholarLocate open access versionFindings
  • Emily Sheng, Kai-Wei Chang, Premkumar Natarajan, and Nanyun Peng. 2019. The woman worked as a babysitter: On biases in language generation. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 3407– 3412, Hong Kong, China. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Richard Socher, Alex Perelygin, Jean Wu, Jason Chuang, Christopher D. Manning, Andrew Ng, and Christopher Potts. 2013. Recursive deep models for semantic compositionality over a sentiment treebank. In Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, pages 1631–1642, Seattle, Washington, USA. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Irene Solaiman, Miles Brundage, Jack Clark, Amanda Askell, Ariel Herbert-Voss, Jeff Wu, Alec Radford, Gretchen Krueger, Jong Wook Kim, Sarah Kreps, Miles McCain, Alex Newhouse, Jason Blazakis, Kris McGuffie, and Jasmine Wang. 2019. Release strategies and the social impacts of language models.
    Google ScholarFindings
  • 4. Phrase structure: translated phrases should maintain the structure as well as the intent of their original phrases. Keep general patterns, such as dependent or independent clauses. Try to keep the number of words in the translation about the same as the number of words in the original phrase.
    Google ScholarFindings
  • 5. Translate the n-word to an appropriate equivalent.
    Google ScholarFindings
  • 6. Keep swear words as is (the exception is the n-word. It needs to be translated as previously stated).
    Google ScholarFindings
Author
Sophie Groenwold
Sophie Groenwold
Lily Ou
Lily Ou
Aesha Parekh
Aesha Parekh
Samhita Honnavalli
Samhita Honnavalli
Sharon Levy
Sharon Levy
Diba Mirza
Diba Mirza
Your rating :
0

 

Tags
Comments
小科