A Feasible Chinese Text Data Preprocessing Strategy

2020 11th IEEE Annual Ubiquitous Computing, Electronics & Mobile Communication Conference (UEMCON)(2020)

引用 1|浏览3
暂无评分
摘要
With the rapid rise of artificial intelligence technologies such as machine learning and the rapid development of the big data industry, more and more attention is paid to the use of data itself, especially the Chinese text data, which is more complex in expression and richer in the information. It is a necessary step to process the raw Chinese text data before it is used for specific application tasks. However, the current strategies for processing data are generally to deal with data in different fields and specific application tasks. In this paper, to further improve the quality of Chinese data processing and give play to the application value of Chinese data, we propose a general and feasible Chinese text preprocessing strategy, named the multi-level data preprocessing strategy (MLDPS). This strategy uses four effective links to process raw Chinese text data systematically. We believe that the proposed MLDPS has relatively strong practical significance, and provides a better idea for preprocessing Chinese text data.
更多
查看译文
关键词
big data,preprocessing,data cleaning,strategy
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要