Chinese MentalBERT: Domain-Adaptive Pre-training on Social Media for Chinese Mental Health Text Analysis
CoRR(2024)
摘要
In the current environment, psychological issues are prevalent and
widespread, with social media serving as a key outlet for individuals to share
their feelings. This results in the generation of vast quantities of data
daily, where negative emotions have the potential to precipitate crisis
situations. There is a recognized need for models capable of efficient
analysis. While pre-trained language models have demonstrated their
effectiveness broadly, there's a noticeable gap in pre-trained models tailored
for specialized domains like psychology. To address this, we have collected a
huge dataset from Chinese social media platforms and enriched it with publicly
available datasets to create a comprehensive database encompassing 3.36 million
text entries. To enhance the model's applicability to psychological text
analysis, we integrated psychological lexicons into the pre-training masking
mechanism. Building on an existing Chinese language model, we performed
adaptive training to develop a model specialized for the psychological domain.
We assessed our model's effectiveness across four public benchmarks, where it
not only surpassed the performance of standard pre-trained models but also
showed a inclination for making psychologically relevant predictions. Due to
concerns regarding data privacy, the dataset will not be made publicly
available. However, we have made the pre-trained models and codes publicly
accessible to the community via:
https://github.com/zwzzzQAQ/Chinese-MentalBERT.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要