A Prioritization Model for Suicidality Risk Assessment

ACL, pp. 8124-8137, 2020.

Cited by: 0|Bibtex|Views14|Links
EI
Keywords:
hierarchical Time-Biased Gainrisk assessmentAutomated Readability Indexsocial mediumExpected Reciprocal RankMore(5+)
Weibo:
We introduce hierarchical Time-Biased Gain, a variant of TBG in which individuals are the top level ranked items, and expected reading time is modeled for the ranked list of documents that provides evidence for each individual’s assessment

Abstract:

We reframe suicide risk assessment from social media as a ranking problem whose goal is maximizing detection of severely at-risk individuals given the time available. Building on measures developed for resource-bounded document retrieval, we introduce a well founded evaluation paradigm, and demonstrate using an expert-annotated test colle...More

Code:

Data:

0
Introduction
  • Mental illness is one of the most significant problems in healthcare: in economic terms alone, by 2030 mental illness worldwide is projected to cost more than cardiovascular disease, and more than cancer, chronic respiratory diseases, and diabetes combined (Bloom et al, 2012).
  • Traditional methods for predicting suicidal thoughts and behaviors have failed to make progress for fifty years (Franklin et al, 2017), but with the advent of machine learning approaches (Linthicum et al, 2019), including text analysis methods for psychology (Chung and Pennebaker, 2007) and the rise of research on mental health using social media (Choudhury, 2013), algorithmic classification has reached the point where it can dramatically outstrip performance of prior, more traditional prediction methods (Linthicum et al, 2019; Coppersmith et al, 2018).
  • Further progress is on the way as the community shows increasing awareness and enthusiasm in this problem space (e.g., Milne et al, 2016; Losada et al, 2020; Zirikly et al, 2019)
Highlights
  • Mental illness is one of the most significant problems in healthcare: in economic terms alone, by 2030 mental illness worldwide is projected to cost more than cardiovascular disease, and more than cancer, chronic respiratory diseases, and diabetes combined (Bloom et al, 2012)
  • The good news is that NLP and machine learning are showing strong promise for impact in mental health, just as they are having large impacts everywhere else
  • Rather than evaluating categorical accuracy, ranked retrieval systems are typically evaluated by some measure of search quality that rewards placing desired items closer to the top (Voorhees, 2001). Most such measures use only item position, but we find it important to model the time it takes to recognize desired items, since in our setting the time of qualified users is the most limited resource
  • We introduce hierarchical Time-Biased Gain, a variant of TBG in which individuals are the top level ranked items, and expected reading time is modeled for the ranked list of documents that provides evidence for each individual’s assessment
  • After introducing TBG, in Section 3.2 we develop hierarchical TimeBiased Gain, an extension of TBG, to account for specific properties of risk assessment using social media posts
Results
  • Results and

    Discussion

    is averaged instead of using attention, which can be achieved by fixing ai,j

    1 m in Equation 10.

    This is similar to the HN-AVE baseline in Yang et al (2016).
  • Is averaged instead of using attention, which can be achieved by fixing ai,j.
  • 1 m in Equation 10.
  • This is similar to the HN-AVE baseline in Yang et al (2016).
  • The authors concatenate four feature sets: (1) bag-of-words for vocabulary count larger than three, (2) Glove embedding summing over words, (3) 194 features representing emotional topics from Empath (Fast et al, 2016), 12We experimented with trainable Glove embedding as well as BERT, but saw little to no improvement in performance using cross-validation.
  • The authors plan to explore fine-tuning BERT on Reddit in future work
Conclusion
  • Conclusions and Future

    Work hTBG TBG NDCG@20

    Optimal number of relevant items found in a ranking, given a time budget.
  • Measured at an expected reading time budget of about half a day (4hr20min, half-life 3hrs), the joint ranking approach achieved hTBG of 12.49 compared with 11.70 for a plausible baseline from prior art: using logistic regression to rank individuals, and looking at a individual’s posts in backward chronological order.
  • That increase is just a bit short of identifying one more person in need of immediate help in the experiment’s population of 242 individuals.
  • There are certainly limitations in the study and miles to go before validating the approach in the real world, but the framework should make it easy to integrate and explore other individual rankers, document rankers and explanation mechanisms, and to build user interfaces like the schematic in Figure 1
Summary
  • Introduction:

    Mental illness is one of the most significant problems in healthcare: in economic terms alone, by 2030 mental illness worldwide is projected to cost more than cardiovascular disease, and more than cancer, chronic respiratory diseases, and diabetes combined (Bloom et al, 2012).
  • Traditional methods for predicting suicidal thoughts and behaviors have failed to make progress for fifty years (Franklin et al, 2017), but with the advent of machine learning approaches (Linthicum et al, 2019), including text analysis methods for psychology (Chung and Pennebaker, 2007) and the rise of research on mental health using social media (Choudhury, 2013), algorithmic classification has reached the point where it can dramatically outstrip performance of prior, more traditional prediction methods (Linthicum et al, 2019; Coppersmith et al, 2018).
  • Further progress is on the way as the community shows increasing awareness and enthusiasm in this problem space (e.g., Milne et al, 2016; Losada et al, 2020; Zirikly et al, 2019)
  • Results:

    Results and

    Discussion

    is averaged instead of using attention, which can be achieved by fixing ai,j

    1 m in Equation 10.

    This is similar to the HN-AVE baseline in Yang et al (2016).
  • Is averaged instead of using attention, which can be achieved by fixing ai,j.
  • 1 m in Equation 10.
  • This is similar to the HN-AVE baseline in Yang et al (2016).
  • The authors concatenate four feature sets: (1) bag-of-words for vocabulary count larger than three, (2) Glove embedding summing over words, (3) 194 features representing emotional topics from Empath (Fast et al, 2016), 12We experimented with trainable Glove embedding as well as BERT, but saw little to no improvement in performance using cross-validation.
  • The authors plan to explore fine-tuning BERT on Reddit in future work
  • Conclusion:

    Conclusions and Future

    Work hTBG TBG NDCG@20

    Optimal number of relevant items found in a ranking, given a time budget.
  • Measured at an expected reading time budget of about half a day (4hr20min, half-life 3hrs), the joint ranking approach achieved hTBG of 12.49 compared with 11.70 for a plausible baseline from prior art: using logistic regression to rank individuals, and looking at a individual’s posts in backward chronological order.
  • That increase is just a bit short of identifying one more person in need of immediate help in the experiment’s population of 242 individuals.
  • There are certainly limitations in the study and miles to go before validating the approach in the real world, but the framework should make it easy to integrate and explore other individual rankers, document rankers and explanation mechanisms, and to build user interfaces like the schematic in Figure 1
Tables
  • Table1: Parameters used for TBG and hierarchical TBG
  • Table2: Number of individuals with the number (range) of posts, by dataset and risk category
  • Table3: hTBG scores with three different time budgets, all combinations of individual and document rankers
  • Table4: TBG and NDCG@20 listed to compare with hTBG. Both hTBG’s and TBG’s half lives are set at 3 hrs, and maximum document cutoff is set at 50
Download tables as Excel
Related work
  • NLP for Risk Assessment. Calvo et al (2017) survey NLP for mental health applications using non-clinical texts such as social media. Several recent studies and shared tasks focus on risk assessment of individuals in social media using a multi-level scale (Milne et al, 2016; Yates et al, 2017; Losada et al, 2020). Shing et al (2018) introduce the dataset we use, and Zirikly et al (2019) describe a shared task in which 11 teams tackled the individual-level classification that feeds into our prioritization model (their Task B). Our work contributes by modeling the downstream users’ prioritization task as taking a key step closer to the real-world problem.

    Hierarchical Attention Attention, especially in the context of NLP, has two main advantages: it allows the network to attend to likely-relevant parts of the input (either words or sentences), often leading to improved performance, and it provides insight into which parts of the input are being used to make the prediction. These characteristics have made attention mechanisms a popular choice for deep learning that requires human investigation, such as automatic clinical coding (Baumel et al, 2018; Mullenbach et al, 2018; Shing et al, 2019). Although concerns about using attention for interpretation exist (Jain and Wallace, 2019; Wiegreffe and Pinter, 2019; Wallace, 2019), Shing et al (2019) show hierarchical document attention can align well with human-provided ground truth.
Funding
  • This work has been supported in part by a University of Maryland Strategic Partnership (MPower) seed grant, an AWS Machine Learning Research Award, and an AI + Medicine for High Impact (AIM-HI) Challenge Award
Reference
  • Rakesh Agrawal, Sreenivas Gollapudi, Alan Halverson, and Samuel Ieong. 2009. Diversifying search results. In Proceedings of the Second ACM International Conference on Web Search and Data Mining, WSDM ’09, page 5–14, New York, NY, USA. Association for Computing Machinery.
    Google ScholarLocate open access versionFindings
  • Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. 2015. Neural machine translation by jointly learning to align and translate. In 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings.
    Google ScholarLocate open access versionFindings
  • Krisztian Balog. 2018. Entity-Oriented Search, volume 39 of The Information Retrieval Series. Springer.
    Google ScholarFindings
  • Krisztian Balog, Yi Fang, Maarten de Rijke, Pavel Serdyukov, and Luo Si. 2012. Expertise retrieval. Foundations and Trends in Information Retrieval, 6(2–3):127–256.
    Google ScholarLocate open access versionFindings
  • Tal Baumel, Jumana Nassour-Kassis, Raphael Cohen, Michael Elhadad, and Noemie Elhadad. 2018. Multi-label classification of patient notes: Case study on ICD code assignment. In The Workshops of the The Thirty-Second AAAI Conference on Artificial Intelligence, pages 409–416. AAAI Press.
    Google ScholarLocate open access versionFindings
  • Adrian Benton, Glen Coppersmith, and Mark Dredze. 2017. Ethical research protocols for social media health research. In Proceedings of the First ACL Workshop on Ethics in Natural Language Processing, EthNLP@EACL, pages 94–102. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • David E. Bloom, Elizabeth Cafiero, Eva Jane-Llopis, Shafika Abrahams-Gessel, Lakshmi Reddy Bloom, Sana Fathima, Andrea B. Feigl, Tom Gaziano, Ali Hamandi, Mona Mowafi, Danny O’Farrell, and Emre. 2012. The Global Economic Burden of Noncommunicable Diseases. PGDA Working Papers 8712, Program on the Global Demography of Aging.
    Google ScholarFindings
  • Bureau of Health Workforce. 2020. Designated health professional shortage areas: Statistics, second quarter of fiscal year 2020, designated HPSA quarterly summary.
    Google ScholarFindings
  • Rafael A. Calvo, David N. Milne, M. Sazzad Hussain, and Helen Christensen. 2017. Natural language processing in mental health applications using nonclinical texts. Nat. Lang. Eng., 23(5):649–685.
    Google ScholarLocate open access versionFindings
  • Stevie Chancellor, Michael L. Birnbaum, Eric D. Caine, Vincent M. B. Silenzio, and Munmun De Choudhury. 2019. A Taxonomy of Ethical Tensions in Inferring Mental Health States from Social Media. In Proceedings of the Conference on Fairness, Accountability, and Transparency.
    Google ScholarLocate open access versionFindings
  • Olivier Chapelle, Donald Metlzer, Ya Zhang, and Pierre Grinspan. 2009. Expected reciprocal rank for graded relevance. In Proceedings of the 18th ACM Conference on Information and Knowledge Management, CIKM 2009, pages 621–630. ACM.
    Google ScholarLocate open access versionFindings
  • Munmun De Choudhury. 2013. Role of social media in tackling challenges in mental health. In Proceedings of the 2nd international workshop on Socially-aware multimedia, SAM@ACM Multimedia 2013, pages 49–52. ACM.
    Google ScholarLocate open access versionFindings
  • Cindy Chung and James W Pennebaker. 2007. The psychological functions of function words. Social communication, 1:343–359.
    Google ScholarLocate open access versionFindings
  • Glen Coppersmith, Ryan Leary, Patrick Crutchley, and Alex Fine. 2018. Natural Language Processing of Social Media as Screening for Suicide Risk. Biomedical Informatics Insights, 10:117822261879286.
    Google ScholarLocate open access versionFindings
  • Ethan Fast, Binbin Chen, and Michael S. Bernstein. 2016. Empath: Understanding topic signals in largescale text. In Proceedings of the 2016 CHI Conference on Human Factors in Computing Systems, pages 4647–4657. ACM.
    Google ScholarLocate open access versionFindings
  • Joseph C. Franklin, Jessica D. Ribeiro, Kathryn R. Fox, Kate H. Bentley, Evan M. Kleiman, Xieyining Huang, Katherine M. Musacchio, Adam C. Jaroszewski, Bernard P. Chang, and Matthew K. Nock. 2017. Risk factors for suicidal thoughts and behaviors: A meta-analysis of 50 years of research. Psychological Bulletin, 143(2):187–232.
    Google ScholarLocate open access versionFindings
  • Devin Gaffney and J. Nathan Matias. 2018. Caveat emptor, computational social science: Large-scale missing data in a widely-published reddit corpus. PLOS ONE, 13(7):1–13.
    Google ScholarLocate open access versionFindings
  • Matt Gardner, Joel Grus, Mark Neumann, Oyvind Tafjord, Pradeep Dasigi, Nelson F. Liu, Matthew Peters, Michael Schmitz, and Luke Zettlemoyer. 20AllenNLP: A deep semantic natural language processing platform. In Proceedings of Workshop for NLP Open Source Software (NLP-OSS), pages 1–6. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Holly Hedegaard, Sally C Curtin, and Margaret Warner. 2018. Suicide rates in the United States continue to increase. National Center for Health Statistics.
    Google ScholarLocate open access versionFindings
  • Matthew Honnibal and Mark Johnson. 2015. An improved non-monotonic transition system for dependency parsing. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP 2015, pages 1373–1378. The Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Sarthak Jain and Byron C. Wallace. 2019. Attention is not explanation. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, pages 3543–3556. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Kalervo Jarvelin and Jaana Kekalainen. 2002. Cumulated gain-based evaluation of IR techniques. ACM Transactions on Information Systems (TOIS), 20(4):422–446.
    Google ScholarLocate open access versionFindings
  • Kathryn P. Linthicum, Katherine Musacchio Schafer, and Jessica D. Ribeiro. 2019. Machine learning in suicide science: Applications and ethics. Behavioral Sciences & the Law, 37(3):214–222.
    Google ScholarLocate open access versionFindings
  • David E. Losada, Fabio Crestani, and Javier Parapar. 2020. eRisk 2020: Self-harm and depression challenges. In Advances in Information Retrieval - 42nd European Conference on IR Research, ECIR 2020, Lisbon, Portugal, April 14-17, 2020, Proceedings, Part II, volume 12036 of Lecture Notes in Computer Science, pages 557–563. Springer.
    Google ScholarLocate open access versionFindings
  • David N. Milne, Glen Pink, Ben Hachey, and Rafael A. Calvo. 2016. CLPsych 2016 shared task: Triaging content in online peer-support forums. In Proceedings of the 3rd Workshop on Computational Linguistics and Clinical Psychology: From Linguistic Signal to Clinical Reality, CLPsych@NAACL-HLT 2016, pages 118–127. The Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • James Mullenbach, Sarah Wiegreffe, Jon Duke, Jimeng Sun, and Jacob Eisenstein. 2018. Explainable prediction of medical codes from clinical text. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2018, pages 1101–1111. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Jeffrey Pennington, Richard Socher, and Christopher D. Manning. 2014. Glove: Global vectors for word representation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing, EMNLP 2014, pages 1532–1543. ACL.
    Google ScholarLocate open access versionFindings
  • Tetsuya Sakai. 2019. Graded relevance assessments and graded relevance measures of NTCIR: A survey of the first twenty years. CoRR, abs/1903.11272.
    Findings
  • SAMHSA. 2019. National Survey on Drug Use and Health, 2017 and 2018. Center for Behavioral Health Statistics and Quality. Table 8.58B.
    Google ScholarLocate open access versionFindings
  • Allison Schuck, Raffaella Calati, Shira Barzilay, Sarah Bloch-Elkouby, and Igor Galynker. 2019. Suicide Crisis Syndrome: A review of supporting evidence for a new suicide-specific diagnosis. Behavioral sciences & the law, 37(3):223–239.
    Google ScholarLocate open access versionFindings
  • Mark D. Smucker and Charles L. A. Clarke. 2012. Time-based calibration of effectiveness measures. In The 35th International ACM SIGIR conference on research and development in Information Retrieval, SIGIR ’12, pages 95–104. ACM.
    Google ScholarLocate open access versionFindings
  • Ellen M. Voorhees. 2001. The philosophy of information retrieval evaluation. In Evaluation of CrossLanguage Information Retrieval Systems, Second Workshop of the Cross-Language Evaluation Forum, CLEF 2001, volume 2406 of Lecture Notes in Computer Science, pages 355–370. Springer.
    Google ScholarLocate open access versionFindings
  • Byron C. Wallace. 2019. Thoughts on ”attention is not not explanation”. Medium, Accessed: December, 2019.
    Google ScholarFindings
  • Sarah Wiegreffe and Yuval Pinter. 2019. Attention is not not explanation. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, EMNLPIJCNLP 2019, pages 11–20. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Zichao Yang, Diyi Yang, Chris Dyer, Xiaodong He, Alexander J. Smola, and Eduard H. Hovy. 2016. Hierarchical attention networks for document classification. In NAACL HLT 2016, The 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 1480–1489. The Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Andrew Yates, Arman Cohan, and Nazli Goharian. 2017. Depression and Self-Harm Risk Assessment in Online Forums. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, pages 2968–2978.
    Google ScholarLocate open access versionFindings
  • Michael Zimmer. 2010. “But the data is already public”: on the ethics of research in Facebook. Ethics and Information Technology, 12(4):313–325.
    Google ScholarLocate open access versionFindings
  • Ayah Zirikly, Philip Resnik, Ozlem Uzuner, and Kristy Hollingshead. 2019. CLPsych 2019 shared task: Predicting the degree of suicide risk in Reddit posts. In Proceedings of the Sixth Workshop on Computational Linguistics and Clinical Psychology, pages 24–33. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Han-Chin Shing, Suraj Nair, Ayah Zirikly, Meir Friedenberg, Hal Daume III, and Philip Resnik. 2018.
    Google ScholarFindings
  • Expert, crowdsourced, and machine assessment of suicide risk via online postings. In Proceedings of the Fifth Workshop on Computational Linguistics and Clinical Psychology: From Keyboard to Clinic, CLPsych@NAACL-HTL, pages 25–36. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Han-Chin Shing, Guoli Wang, and Philip Resnik. 2019. Assigning medical codes at the encounter level by paying attention to documents. In ML4H, Machine Learning for Health Workshop at NeurIPS.
    Google ScholarLocate open access versionFindings
  • Our research involving the University of Maryland Reddit Suicide Dataset has undergone review by the University of Maryland Institutional Review Board with a determination of Category 4 Exempt status under U.S. federal regulations. For this dataset, (a) the original data are publicly available, and (b) the originating site (Reddit) is intended for anonymous posting. In addition, since Reddit is officially anonymous, but that is not enforced on the site, the dataset has undergone automatic de-identification using named entity recognition aggressively to identify and mask out potential personally identifiable information such as personal names and organizations, in order to create an additional layer of protection (Zirikly et al., 2019). In an assessment of de-identification quality, we manually reviewed a sample of 200 randomly selected posts (100 from the SuicideWatch subreddit and 100 from other subreddits), revealing zero instances of personally identifiable information.
    Google ScholarLocate open access versionFindings
  • Following Benton et al. (2017), we treat the data (even though de-identified) as sensitive and restrict access to it, we use obfuscated and minimal examples in papers and presentations, and we do not engage in linkage with other datasets.
    Google ScholarFindings
  • The dataset is available to other researchers via an application process put in place with the American Association of Suicidology that requires IRB or equivalent ethical review, a commitment to appropriate data management, and, since ethical research practice is not just a matter of publicly available data or even IRB approval (Zimmer, 2010; Benton et al., 2017; Chancellor et al., 2019), a commitment to following additional ethical guidelines. Interested researchers can find information at http://umiacs.umd.edu/∼resnik/umd reddit suicidality dataset.html.
    Locate open access versionFindings
  • 2015). The CROWDSOURCE dataset is split into a training set (80%) and a validation set (20%) during model development. We did not test on the EXPERT dataset until all parameters of the models were fixed. Cross validation on the training set is used for hyperparameter tuning. For 3HAN, we used ADAM with learning rate 0.003, trained for 100 epochs with early stopping on the validation dataset, with patience set to 30. For 3HAN AV, the same hyperparameters are used. For LR, we used SGD with learning rate 0.003, trained for 100 epochs with early stopping on the validation dataset, with patience set to 30.
    Google ScholarLocate open access versionFindings
  • All models are built using AllenNLP (Gardner et al., 2018). Tokenization and sentence splitting are done using spaCy (Honnibal and Johnson,
    Google ScholarLocate open access versionFindings
Your rating :
0

 

Tags
Comments