Using Deep Learning for Obscene Language Detection in Vietnamese Social Media

Dai Tho Dang, Xuan Thang Tran, Cong Phap Huynh,Ngoc Thanh Nguyen

The 12th Conference on Information Technology and Its Applications Lecture Notes in Networks and Systems（2023）

引用 0|浏览1

暂无评分

摘要

Nowadays, a vast volume of text data is generated by Vietnamese people daily on social media platforms. Besides the enormous benefits, this situation creates many challenges. One of them concerns the fact that a tremendous amount of text contains obscene language. This kind of data negatively affects readers, especially young people. Detecting this kind of text is an important problem. In this paper, we investigate this problem using Deep Learning (DL) models such as Convolutional Neural Networks (CNN), Long-Short Term Memory (LSTM), and Bidirectional Long-Short Term Memory (BiLSTM). Besides, we combine LSTM and CNN in both sequence (sequential LSTM-CNN) and parallel (parallel LSTM-CNN) forms and sequential BiLSTM-CNN to solve this task. For word embedding phrase, we use Word2vec and PhoBERT. Experiment results show that the BiLSTM model with PhoBERT gains the best results for the obscene discrimination task, with 81.4% and 81.5% for accuracy and F1-score, respectively.

查看译文

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要