Web table data integration based on smart campus scenarios to resolve name disambiguation of scientific research personnel

Junfan Jin, Junxiang Chen,Jilin Zhang, Tao Li,Ruixiang Qian,Feng Liu,Li Zhou,Yongjian Ren

2022 IEEE 46TH ANNUAL COMPUTERS, SOFTWARE, AND APPLICATIONS CONFERENCE (COMPSAC 2022)(2022)

引用 0|浏览9
暂无评分
摘要
Name ambiguity issue that results from the similarity of many common Chinese names. With the development of artificial intelligence, the disambiguation model based on machine learning has achieved better disambiguation effects and has been widely used in various universities. However, continually improving the disambiguation effect remains a major challenge. Smart campuses based on the Internet of Things are developing rapidly, and a large number of discretely distributed web tables that omit data values exist. However, the usable attributes of the disambiguation model are limited. To overcome these challenges, this study proposes a name disambiguation model of web tables from data integration (NDWT) in smart campuses. The model first recognises the label mapping in a webpage table using four types of label matchers and then designs the instance comparator based on the obtained label mapping. The web tables are integrated according to the instance mapping relationship, and two datasets, one before (BWT) and the other after (AWT) integration, are obtained. Relevant features are subsequently extracted from these two datasets and trained. Finally, the NDWT model is used for disambiguation experiments. Comparative experiments, conducted using seven different types of ML models, show that the NDWT model improves significantly after the integration of web tables; in particular, the pairwise F1 of the K-means model increases by 43.23%. The pairwise F1 of the remaining models increases by approximately 10%. The experimental evaluation proves the feasibility of the NDWT model proposed in this study. Confirming that it can achieve a higher distribution quality compared to conventional name disambiguation methods.
更多
查看译文
关键词
Data integration, Web Tables, Machine learning, Name disambiguation
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要