AIA-BDE - A Corpus of FAQs in Portuguese and their Variations.
LREC(2020)
摘要
We present AIA-BDE, a corpus of 380 domain-oriented FAQs in Portuguese and their variations, i.e., paraphrases or entailed questions, created manually, by humans, or automatically, with Google Translate. Its aims to be used as a benchmark for FAQ retrieval and automatic question-answering, but may be useful in other contexts, such as the development of task-oriented dialogue systems, or models for natural language inference in an interrogative context. We also report on two experiments. Matching variations with their original questions was not trivial with a set of unsupervised baselines, especially for manually created variations. Besides high performances obtained with ELMo and BERT embeddings, an Information Retrieval system was surprisingly competitive when considering only the first hit. In the second experiment, text classifiers were trained with the original questions, and tested when assigning each variation to one of three possible sources, or assigning them as out-of-domain. Here, the difference between manual and automatic variationswas not so significant.
更多查看译文
关键词
FAQretrieval, corpora creation, paraphrases detection, textual entailment, dialogue systems, Portuguese Language Processing
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络