Extracting section structure from resumes in Brazilian Portuguese

EXPERT SYSTEMS WITH APPLICATIONS(2024)

引用 0|浏览0
暂无评分
摘要
This paper presents a novel resume parser designed to effectively reorganize the textual content of any resume into its original section structure. Our work addresses two practical challenges overlooked by the existing literature: (i) ensuring the correct reading order of text retrieved from resume files and (ii) extracting individually all sections, as well as work experience and education subsections. By taking into account the observation that most resumes adhere to basic document templates, we reframe the reading order problem as a template identification task. Our experiments suggest that even a widely-used small model like EfficientNet-B0 can accurately identify common templates. Additionally, we propose a sequence tagging approach that simultaneously identifies all resume sections and some subsections. We implement and compare two solutions based on the well-known CRF and BERT models. Our evaluation provides strong evidence that the CRF can serve as a practical alternative to BERT, depending on hardware and budget constraints. They yield comparable results in terms of identifying resume sections, while BERT displays a substantial advantage when identifying education and work experience subsections. An interesting direction for future work is to expand our approach to ensure the correct ordering of a large family of templates.
更多
查看译文
关键词
Resume parsing,Natural language processing,Information extraction,Image classification,Text segmentation,Human resources
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要