Mining experimental data from Materials Science literature with Large Language Models: an evaluation study
arxiv(2024)
摘要
This study is dedicated to assessing the capabilities of large language
models (LLMs) such as GPT-3.5-Turbo, GPT-4, and GPT-4-Turbo in extracting
structured information from scientific documents in materials science. To this
end, we primarily focus on two critical tasks of information extraction: (i) a
named entity recognition (NER) of studied materials and physical properties and
(ii) a relation extraction (RE) between these entities. Due to the evident lack
of datasets within Materials Informatics (MI), we evaluated using SuperMat,
based on superconductor research, and MeasEval, a generic measurement
evaluation corpus. The performance of LLMs in executing these tasks is
benchmarked against traditional models based on the BERT architecture and
rule-based approaches (baseline). We introduce a novel methodology for the
comparative analysis of intricate material expressions, emphasising the
standardisation of chemical formulas to tackle the complexities inherent in
materials science information assessment. For NER, LLMs fail to outperform the
baseline with zero-shot prompting and exhibit only limited improvement with
few-shot prompting. However, a GPT-3.5-Turbo fine-tuned with the appropriate
strategy for RE outperforms all models, including the baseline. Without any
fine-tuning, GPT-4 and GPT-4-Turbo display remarkable reasoning and
relationship extraction capabilities after being provided with merely a couple
of examples, surpassing the baseline. Overall, the results suggest that
although LLMs demonstrate relevant reasoning skills in connecting concepts,
specialised models are currently a better choice for tasks requiring extracting
complex domain-specific entities like materials. These insights provide initial
guidance applicable to other materials science sub-domains in future work.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要