SeLaB: Semantic Labeling with BERT

2021 International Joint Conference on Neural Networks (IJCNN)(2021)

Cited 10|Views14
No score
Generating schema labels automatically for column values of data tables has many data science applications such as schema matching, and data discovery and linking. For example, automatically extracted tables with missing headers can be filled by the predicted schema labels which significantly reduces human effort. Furthermore, the predicted labels can reduce the impact of inconsistent names across multiple data tables. In this paper, we propose a context-aware semantic labeling method using both data values and contextual information of columns. Our proposed method is based on formulating the semantic labeling task as a structured prediction problem, where we sequentially predict labels for an input table with missing headers. We incorporate both the values and context of each data column using the pre-trained contextualized language model, BERT. To our knowledge, we are the first to successfully adapt BERT to solve the semantic labeling task. We evaluate our approach using two real-world datasets from different domains, and we demonstrate substantial improvements in terms of evaluation metrics over state-of-the-art feature-based methods.
Translated text
Key words
semantic labeling, pretrained language model, data table
AI Read Science
Must-Reading Tree
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined