A New Machine-Learning Extracting Approach to Construct a Knowledge Base: A Case Study on Global Stromatolites over Geological Time

Xiaobo Zhang,Hao Li,Qiang Liu,Zhenhua Li,Claire E. Reymond,Min Zhang,Yuangeng Huang,Hongfei Chen,Zhong-Qiang Chen

Journal of Earth Science/Journal of earth science（2023）

引用 1|浏览16

暂无评分

摘要

Within any scientific disciplines, a large amount of data are buried within various literature depositories and archives, making it difficult to manually extract useful information from the datum swamps. The machine-learning extraction of data therefore is necessary for the big-data-based studies. Here, we develop a new text-mining technique to reconstruct the global database of the Precambrian to Recent stromatolites, providing better understanding of secular changes of stromatolites though geological time. The step-by-step data extraction process is described as below. First, the PDF documents of stromatolite-containing literatures were collected, and converted into text formation. Second, a glossary and tag-labeling system using NLP (Natural Language Processing) software was employed to search for all possible candidate pairs from each sentence within the papers collected here. Third, each candidate pair and features were represented as a factor graph model using a series of heuristic procedures to score the weights of each pair feature. Occurrence data of stromatolites versus stratigraphical units (abbreviated as Strata), facies types, locations, and age worldwide were extracted from literatures, respectively, and their extraction accuracies are 92

查看译文

关键词

machine learning,knowledge base construction,stromatolites,Precambrian,knowledge graph

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要