MathAlign - Linking Formula Identifiers to their Contextual Natural Language Descriptions.

LREC(2020)

引用 0|浏览18
暂无评分
摘要
Extending machine reading approaches to extract mathematical concepts and their descriptions is useful for a variety of tasks, ranging from mathematical information retrieval to increasing accessibility of scientific documents for the visually impaired. This entails segmenting mathematical formulae into identifiers and linking them to their natural language descriptions. We propose a rule-based approach for this task, which extracts LATEX representations of formula identifiers and links them to their in-text descriptions, given only the original PDF and the location of the formula of interest. We also present a novel evaluation dataset for this task, as well as the tool used to create it. The data and the source code are open source and are available at https://osf.io/bdxmr/ and https://github.com/ml4ai/automates, respectively.
更多
查看译文
关键词
machine reading, relation extraction, math information retrieval, corpus creation, tool creation
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要