Bootstrapping a Lexicon of Multiword Adverbs for Brazilian Portuguese.

International Conference on Computational and Corpus-Based Phraseology (Europhras)(2022)

引用 0|浏览7
暂无评分
摘要
This paper presents the process for bootstrapping a computational lexicon of multiword adverbs for Brazilian Portuguese (PT-BR) from an already existing lexicon built for the European variety of the language (PT-PT). This ongoing work aims to identify, collect, and provide a syntactical description of multiword adverbs in PT-BR, in order to produce a comprehensive lexicon of multiword adverbs in Portuguese. First, existing resources for this part-of-speech are presented, followed by the methods adopted for building this novel resource. Up to the present moment, approximately 700 new PT-BR multiword adverbs entered the lexicon, totaling, nearly 2,300 entries. We assessed this new lexical resource against a sample of 1,000 sentences, taken from a publicly available corpus collected from Brazilian Portuguese journalistic texts. Results are promising, although there is still room for improvement, given that the F-measure only reached a suboptimal 0.66 mark. We estimate that another 2,100 PT-BR adverbs will enter the lexicon, totaling +4,000 multiword adverbs in Portuguese.
更多
查看译文
关键词
Multiword adverbs, Computational lexicon, Portuguese
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要