DistALANER: Distantly Supervised Active Learning Augmented Named Entity Recognition in the Open Source Software Ecosystem
CoRR(2024)
摘要
This paper proposes a novel named entity recognition (NER) technique
specifically tailored for the open-source software systems. Our approach aims
to address the scarcity of annotated software data by employing a comprehensive
two-step distantly supervised annotation process. This process strategically
leverages language heuristics, unique lookup tables, external knowledge
sources, and an active learning approach. By harnessing these powerful
techniques, we not only enhance model performance but also effectively mitigate
the limitations associated with cost and the scarcity of expert annotators. It
is noteworthy that our framework significantly outperforms the state-of-the-art
LLMs by a substantial margin. We also show the effectiveness of NER in the
downstream task of relation extraction.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要