Social Chemistry 101: Learning to Reason about Social and Moral Norms

Jena D. Hwang
Jena D. Hwang
Vered Shwartz
Vered Shwartz

EMNLP 2020, pp. 653-670, 2020.

Cited by: 0|Bibtex|Views20|DOI:https://doi.org/10.18653/V1/2020.EMNLP-MAIN.48
Other Links: arxiv.org|academic.microsoft.com
Weibo:
Comprehensive modeling of social norms presents a promising challenge for NLP work in the future

Abstract:

Social norms—the unspoken commonsense rules about acceptable social behavior—are crucial in understanding the underlying causes and intents of people’s actions in narratives. For example, underlying an action such as “wanting to call cops on my neighbor” are social norms that inform our conduct, such as “It is expected that you report cri...More

Code:

Data:

0
Introduction
  • Understanding and reasoning about social situations relies on unspoken commonsense rules about social norms, i.e., acceptable social behavior (Haidt, 2012).
  • When faced with situations like “wanting to call the cops on the author's neighbors,” (Figure 1), the authors perform a rich variety of reasoning about about legality, cultural pressure, Social Judgment.
  • GOOD having an open and honest dialogue with your neighbors.
  • EXPECTED calling the cops when you see a crime.
  • BAD calling the authorities if your neighbor is being rude.
  • DISCRETIONARY not being friends with your neighbors.
  • PRESSURE AGAINST calling the cops on your neighbors Agency.
  • Wanting to call the cops on the author's neighbors
Highlights
  • Understanding and reasoning about social situations relies on unspoken commonsense rules about social norms, i.e., acceptable social behavior (Haidt, 2012)
  • We introduce SOCIAL CHEMISTRY as a new formalism to study people’s social and moral norms over everyday real life situations
  • We investigate how state-of-the-art neural language models can learn and generalize out of SOCIAL-CHEM-101 to accurately reason about social norms with respect to a previously unseen situation
  • We find that our empirical results align with the Moral Foundation Theory of Graham et al (2009); Haidt (2012) on how the moral norms of different communities vary depending on their political leanings and news reliability
  • Comprehensive modeling of social norms presents a promising challenge for NLP work in the future
Results
  • 5.1 Tasks

    While the authors train each model on all (RoT or action) objectives at once, the authors pick two particular objectives to asses the models.
  • The second setting is p(y|s, by) — “conditional.” We provide models with a set of attributes by that they must follow when generating an RoT y
  • This presents a more challenging setup, because models cannot condition on the set of attributes that they find most likely.
  • No model is able to achieve a high score on all columns in the bottom half of the table
  • This indicates that fully constrained conditional generation may still present a significant challenge for current models
Conclusion
  • The authors present SOCIAL-CHEM-101, an attempt at providing a formalism and resource around the study of grounded social, moral, and ethical norms.
  • The authors' experiments demonstrate preliminary success in generative modeling of structured RoTs, and corroborate findings of moral leaning in an extrinsic task.
  • Comprehensive modeling of social norms presents a promising challenge for NLP work in the future
Summary
  • Introduction:

    Understanding and reasoning about social situations relies on unspoken commonsense rules about social norms, i.e., acceptable social behavior (Haidt, 2012).
  • When faced with situations like “wanting to call the cops on the author's neighbors,” (Figure 1), the authors perform a rich variety of reasoning about about legality, cultural pressure, Social Judgment.
  • GOOD having an open and honest dialogue with your neighbors.
  • EXPECTED calling the cops when you see a crime.
  • BAD calling the authorities if your neighbor is being rude.
  • DISCRETIONARY not being friends with your neighbors.
  • PRESSURE AGAINST calling the cops on your neighbors Agency.
  • Wanting to call the cops on the author's neighbors
  • Results:

    5.1 Tasks

    While the authors train each model on all (RoT or action) objectives at once, the authors pick two particular objectives to asses the models.
  • The second setting is p(y|s, by) — “conditional.” We provide models with a set of attributes by that they must follow when generating an RoT y
  • This presents a more challenging setup, because models cannot condition on the set of attributes that they find most likely.
  • No model is able to achieve a high score on all columns in the bottom half of the table
  • This indicates that fully constrained conditional generation may still present a significant challenge for current models
  • Conclusion:

    The authors present SOCIAL-CHEM-101, an attempt at providing a formalism and resource around the study of grounded social, moral, and ethical norms.
  • The authors' experiments demonstrate preliminary success in generative modeling of structured RoTs, and corroborate findings of moral leaning in an extrinsic task.
  • Comprehensive modeling of social norms presents a promising challenge for NLP work in the future
Tables
  • Table1: Generative model objectives corresponding to the training setups we consider. Each model (RoT or action) is trained on all objectives simultaneously
  • Table2: Human evaluation results for conditionally generating RoTs and actions, either letting the models choose the attributes (top half), or providing the attributes as input constraints (bottom half). All columns are micro-F1 scores (0–1), except Relevance (1–3). Takeaway: While state-of-the-art models are able to generate relevant RoTs and actions that generally follow constraints (moderately high scores in some columns), correctly conditioning on a complete set of attributes remains challenging (several columns show poor model performance in bottom half)
  • Table3: Test set performance by automatic metrics, including an attribute classifier. Perplexities are not comparable between encoder-decoder models (Bart and T5, loss on xout only) and other models (loss on full sequence x). Takeaway: Automatic metrics corroborate human evaluation results: while T5 is most adept at BLEU, GPT-2 XL more consistently adheres to attributes (Attr. μF1)
  • Table4: Correlations between generated RoT attributes for headlines and the news source’s political leaning (left: neg., right: pos.) and reliability (controlled for political leaning). Results shown are significant after Holm-correction for multiple comparisons (p < 0.001: ∗∗∗, p < 0.01: ∗∗, p < 0.05: ∗, p > 0.05: n.s.). Takeaway: We see evidence that a model trained on the SOCIAL-CHEM-101 Dataset can naturally uncover moral and topical leanings in news sources, mirroring results found in previous news studies
  • Table5: Correlations between worker demographics and categorical RoT annotations, Bonferroni corrected for multiple comparisons (p < 0.0001: ∗∗∗, p < 0.001: ∗∗, p < 0.01: ∗)
Download tables as Excel
Related work
  • Our formalism heavily draws from works in descriptive ethics and social psychology, but is also inspired by studies in social implicatures and cooperative principles in pragmatics (Kallia, 2004; Grice, 1975) and the theories of situationally-rooted evocation of frames (Fillmore and Baker, 2001).

    Our work adds to the growing literature concerned with distilling reactions to situations (Vu et al, 2014; Ding and Riloff, 2016) as well as so-

    10We use the MediaBias/FactCheck ratings: https:// mediabiasfactcheck.com.

    cial and moral dynamics in language (Van Hee et al, 2015). Commonly used for coarse-grained analyses of morality in text (Fulgoni et al, 2016; Volkova et al, 2017; Weber et al, 2018), Graham et al (2009) introduce the Moral Foundations lexicon, a dictionary of morality-evoking words (later extended by Rezapour et al, 2019).

    A recent line of work focused on representing social implications of everyday situations in freeform text in a knowledge graph (Rashkin et al, 2018; Sap et al, 2019). Relatedly, Sap et al (2020) introduce Social Bias Frames, a hybrid free-text and categorical formalism to reason about biased implications in language. In contrast, our work formalizes a new type of reasoning around expectations of social norms evoked by situations.
Funding
  • This material is based upon work supported by the National Science Foundation Graduate Research Fellowship under Grant No DGE1256082, and in part by NSF (IIS-1714566), DARPA CwC through ARO (W911NF15-1-0543), DARPA MCS program through NIWC Pacific (N66001-19-2-4031), and the Allen Institute for AI
Study subjects and analysis
theoretically-motivated dimensions of people: 12
For example, “It’s rude to run the blender at 5am.”. Each RoT is further broken down with 12 theoretically-motivated dimensions of people’s judgments such as social judgments of good and bad, theoretical categories of moral foundations, relevant. For example, if the neighbors are African American, it might be worse to call the cops due to racial profiling (Eberhardt, 2020)

workers: 3
For example, in a situation, like “My brother chased after the Uber driver,” workers mark the underlined spans. We collect three workers’ spans, calling each span a character. All characters identified become candidates for grounding RoTs and actions in the structured annotation

Reference
  • Anurag Acharya, Kartik Talamadupula, and Mark A Finlayson. 2020. An atlas of cultural commonsense for machine reasoning.
    Google ScholarFindings
  • George J Bowdery. 1941. Conventions and norms. Philosophy of Science, 8(4):493–505.
    Google ScholarLocate open access versionFindings
  • Haibo Ding and Ellen Riloff. 2016. Acquiring knowledge of affective events from blogs using label propagation. In AAAI.
    Google ScholarFindings
  • Jennifer L Eberhardt. 2020. Biased: Uncovering the hidden prejudice that shapes what we see, think, and do. Penguin Books.
    Google ScholarFindings
  • Jon Elster. 2006. Fairness and norms. Social Research, pages 365–376.
    Google ScholarLocate open access versionFindings
  • Charles J Fillmore and Collin F Baker. 2001. Frame semantics for text understanding. In Proceedings of WordNet and Other Lexical Resources Workshop, NAACL, volume 6.
    Google ScholarLocate open access versionFindings
  • James A Kitts and Yen-Sheng Chiang. 2008. Encyclopedia of social problems,.
    Google ScholarFindings
  • Lawrence Kohlberg. 1976. Moral stages and moralization. Moral development and behavior, pages 31– 53.
    Google ScholarFindings
  • Mike Lewis, Yinhan Liu, Naman Goyal, Marjan Ghazvininejad, Abdelrahman Mohamed, Omer Levy, Ves Stoyanov, and Luke Zettlemoyer. 201Bart: Denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension. arXiv preprint arXiv:1910.13461.
    Findings
  • Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, and Veselin Stoyanov. 2019. Roberta: A robustly optimized bert pretraining approach. ArXiv, abs/1907.11692.
    Findings
  • Nicholas Lourie, Ronan Le Bras, and Yejin Choi. 2020. Scruples: A corpus of community ethical judgments on 32,000 real-life anecdotes. arXiv e-prints.
    Google ScholarFindings
  • Dean Fulgoni, Jordan Carpenter, Lyle Ungar, and Daniel Preotiuc-Pietro. 2016. An empirical exploration of moral foundations theory in partisan news sources. In LREC, pages 3730–3736.
    Google ScholarLocate open access versionFindings
  • Jesse Graham, Jonathan Haidt, and Brian A Nosek. 2009. Liberals and conservatives rely on different sets of moral foundations. J. Pers. Soc. Psychol., 96(5):1029–1046.
    Google ScholarLocate open access versionFindings
  • Herbert P Grice. 1975. Logic and conversation. In Speech acts, pages 41–58. Brill.
    Google ScholarLocate open access versionFindings
  • Jonathan Haidt. 2012. The righteous mind: Why good people are divided by politics and religion. Vintage.
    Google ScholarFindings
  • Jonathan Haidt, Silvia Helena Koller, and Maria G Dias. 1993.
    Google ScholarFindings
  • Affect, culture, and morality, or is it wrong to eat your dog? Journal of personality and social psychology, 65(4):613.
    Google ScholarLocate open access versionFindings
  • Richard Mervyn Hare, Richard Mervyn Hare, Richard Mervyn Hare Hare, and Richard M Hare. 1981. Moral thinking: Its levels, method, and point. Oxford: Clarendon Press; New York: Oxford University Press.
    Google ScholarFindings
  • Dan Hendrycks, Collin Burns, Steven Basart, Andrew Critch, Jerry Li, Dawn Song, and Jacob Steinhardt. 2020. Aligning ai with shared human values.
    Google ScholarFindings
  • Ari Holtzman, Jan Buys, Li Du, Maxwell Forbes, and Yejin Choi. 20The curious case of neural text degeneration. In International Conference on Learning Representations.
    Google ScholarLocate open access versionFindings
  • Bertram F Malle, Steve Guglielmo, and Andrew E Monroe. 2014. A theory of blame. Psychological Inquiry, 25(2):147–186.
    Google ScholarLocate open access versionFindings
  • George A Miller. 1995. Wordnet: a lexical database for english. Communications of the ACM, 38(11):39– 41.
    Google ScholarLocate open access versionFindings
  • Nasrin Mostafazadeh, Nathanael Chambers, Xiaodong He, Devi Parikh, Dhruv Batra, Lucy Vanderwende, Pushmeet Kohli, and James Allen. 2016. A corpus and cloze evaluation for deeper understanding of commonsense stories. In Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 839–849, San Diego, California. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • J Nørregaard, B D Horne, and S Adalı. 2019. NELAGT-2018: A large multi-labelled news dataset for the study of misinformation in news articles. In AAAI. wvvw.aaai.org.
    Google ScholarFindings
  • Kishore Papineni, Salim Roukos, Todd Ward, and WeiJing Zhu. 2002. Bleu: a method for automatic evaluation of machine translation. In Proceedings of the 40th annual meeting on association for computational linguistics, pages 311–318. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Goncalo Pereira, Rui Prada, and Pedro A Santos. 2016. Integrating social power into the decision-making of cognitive agents. Artificial Intelligence, 241:1–44.
    Google ScholarLocate open access versionFindings
  • Jerome Kagan. 1984. The nature of the child. Basic Books.
    Google ScholarFindings
  • Alexandra Kallia. 2004. Linguistic politeness: The implicature approach. Multilingua, 23(1/2):145–170.
    Google ScholarLocate open access versionFindings
  • H Wesley Perkins and Alan D Berkowitz. 1986. Perceiving the community norms of alcohol use among students: Some research implications for campus alcohol education programming. International journal of the Addictions, 21(9-10):961–976.
    Google ScholarLocate open access versionFindings
  • Alec Radford, Karthik Narasimhan, Tim Salimans, and Ilya Sutskever. 2018. Improving language understanding by generative pre-training.
    Google ScholarFindings
  • Alec Radford, Jeffrey Wu, Rewon Child, David Luan, Dario Amodei, and Ilya Sutskever. 2019. Language models are unsupervised multitask learners. OpenAI Blog.
    Google ScholarFindings
  • Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, and Peter J Liu. 2019. Exploring the limits of transfer learning with a unified text-to-text transformer. arXiv preprint arXiv:1910.10683.
    Findings
  • Hannah Rashkin, Maarten Sap, Emily Allaway, Noah A Smith, and Yejin Choi. 2018. Event2mind: Commonsense inference on events, intents, and reactions. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 463–473.
    Google ScholarLocate open access versionFindings
  • Rezvaneh Rezapour, Saumil H Shah, and Jana Diesner. 2019. Enhancing the measurement of social effects by capturing morality. In Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis, pages 35–45, Stroudsburg, PA, USA. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Maarten Sap, Saadia Gabriel, Lianhui Qin, Dan Jurafsky, Noah A Smith, and Yejin Choi. 2020. Social bias frames: Reasoning about social and power implications of language. In ACL.
    Google ScholarFindings
  • Maarten Sap, Ronan LeBras, Emily Allaway, Chandra Bhagavatula, Nicholas Lourie, Hannah Rashkin, Brendan Roof, Noah A Smith, and Yejin Choi. 2019. ATOMIC: An atlas of machine commonsense for ifthen reasoning. In AAAI.
    Google ScholarFindings
  • Richard A Shweder. 1990. In defense of moral realism: Reply to gabennesch. Child Development, 61(6):2060–2067.
    Google ScholarLocate open access versionFindings
  • Yi Tay, Donovan Ong, Jie Fu, Alvin Chan, Nancy Chen, Anh Tuan Luu, and Christopher Pal. 2020. Would you rather? a new benchmark for learning machine alignment with cultural values and social preferences. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 5369–5373.
    Google ScholarLocate open access versionFindings
  • Kristina Toutanova, Dan Klein, Christopher D Manning, and Yoram Singer. 2003. Feature-rich part-ofspeech tagging with a cyclic dependency network. In Proceedings of the 2003 conference of the North American chapter of the association for computational linguistics on human language technologyvolume 1, pages 173–180. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Cynthia Van Hee, Els Lefever, Ben Verhoeven, Julie Mennes, Bart Desmet, Guy De Pauw, Walter Daelemans, and Veronique Hoste. 2015. Detection and fine-grained classification of cyberbullying events. In Proceedings of the International Conference Recent Advances in Natural Language Processing, pages 672–680, Hissar, Bulgaria. INCOMA Ltd. Shoumen, BULGARIA.
    Google ScholarLocate open access versionFindings
  • Svitlana Volkova, Kyle Shaffer, Jin Yea Jang, and Nathan Hodas. 2017. Separating facts from fiction: Linguistic models to classify suspicious and trusted news posts on twitter. In ACL, pages 647– 653, Stroudsburg, PA, USA. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Hoa Trong Vu, Graham Neubig, Sakriani Sakti, Tomoki Toda, and Satoshi Nakamura. 2014. Acquiring a dictionary of emotion-provoking events. In Proceedings of the 14th Conference of the European Chapter of the Association for Computational Linguistics, volume 2: Short Papers, pages 128–132.
    Google ScholarLocate open access versionFindings
  • Su Wang, Greg Durrett, and Katrin Erk. 2018. Modeling semantic plausibility by injecting world knowledge. In NAACL-HLT.
    Google ScholarFindings
  • Rene Weber, J Michael Mangus, Richard Huskey, Frederic R Hopp, Ori Amir, Reid Swanson, Andrew Gordon, Peter Khooshabeh, Lindsay Hahn, and Ron Tamborini. 2018. Extracting latent moral information from text narratives: Relevance, challenges, and solutions. Commun. Methods Meas., 12(2-3):119– 139.
    Google ScholarLocate open access versionFindings
  • Thomas Wolf, Lysandre Debut, Victor Sanh, Julien Chaumond, Clement Delangue, Anthony Moi, Pierric Cistac, Tim Rault, R’emi Louf, Morgan Funtowicz, and Jamie Brew. 2019. Huggingface’s transformers: State-of-the-art natural language processing. ArXiv, abs/1910.03771.
    Findings
  • Rowan Zellers, Ari Holtzman, Hannah Rashkin, Yonatan Bisk, Ali Farhadi, Franziska Roesner, and Yejin Choi. 2019. Defending against neural fake news. In NeurIPS.
    Google ScholarFindings
  • Tianyi Zhang, Varsha Kishore, Felix Wu, Kilian Q. Weinberger, and Yoav Artzi. 2020. Bertscore: Evaluating text generation with bert. In International Conference on Learning Representations.
    Google ScholarLocate open access versionFindings
  • 1. r/amitheasshole (30k) — The Am I the Asshole? (AITA) subreddit. This posts of this subreddit pose moral quandries, such as “AITA for wanting to uninvite an (ex?)-friend from my wedding for shit-talking our marriage?” We use the data from Lourie et al. (2020). They scrape the titles of posts, omitting the preamble (e.g., “AITA for”), normalizing to present tense, and filtering out administrative posts. We do not use any annotations provided by that community (where other posters vote who had the moral high ground).
    Google ScholarLocate open access versionFindings
Full Text
Your rating :
0

 

Tags
Comments