谷歌浏览器插件
订阅小程序
在清言上使用

Extracting Material Property Measurement Data from Scientific Articles

Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing(2021)

引用 2|浏览11
暂无评分
摘要
Machine learning-based prediction of material properties is often hampered by the lack of sufficiently large training datasets. The majority of such measurement data is embedded in scientific literature and the ability to automatically extract these data is essential to support the development of reliable property prediction methods. In this work, we describe a methodology for an automatic property extraction framework using material solubility as the target property. We create an annotated dataset containing tags for solubility-related entities using a combination of regular expressions and manual tagging. We then compare five entity recognition models leveraging both token-level and span-level architectures on the task of classifying solute names, solubility values, and solubility units. Additionally, we explore a novel pretraining approach that leverages automated chemical name and quantity extraction tools to generate large datasets that do not rely on intensive manual effort. Finally, we perform an analysis to identify the causes of classification errors.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要