AI帮你理解科学

AI 生成解读视频

AI抽取解析论文重点内容自动生成视频


pub
生成解读视频

AI 溯源

AI解析本论文相关学术脉络


Master Reading Tree
生成 溯源树

AI 精读

AI抽取本论文的概要总结


微博一下
This work demonstrates the effectiveness of GPT-2 for conversational query rewriting

Few-Shot Generative Conversational Query Rewriting

SIGIR '20: The 43rd International ACM SIGIR conference on research and development in Information Re..., pp.1933-1936, (2020)

被引用8|浏览326
EI
下载 PDF 全文
引用
微博一下

摘要

Conversational query rewriting aims to reformulate a concise conversational query to a fully specified, context-independent query that can be effectively handled by existing information retrieval systems. This paper presents a few-shot generative approach to conversational query rewriting. We develop two methods, based on rules and self-s...更多

代码

数据

0
简介
  • Recent advances in deep learning and text understanding facilitate the transition of information retrieval systems from keyword-based queries and “ten-blue” links to more conversational experiences.
  • Q1 Tell the author about the Bronze Age collapse.
  • What is the evidence for the Bronze Age collapse?
  • The possible causes of the Bronze Age collapse?.
  • A signature of Conversational IR is its multi-round interactions with the user, an opportunity to understand and assist with more complex tasks and a challenge to query understanding.
  • The user begins with a fully specified query (Q1), but quickly starts to use references (Q2) and omissions (Q3), which is very different from typical keyword-based search sessions
重点内容
  • Recent advances in deep learning and text understanding facilitate the transition of information retrieval systems from keyword-based queries and “ten-blue” links to more conversational experiences
  • The overall Results in TREC Conversational Assistance Track (CAsT) are presented in Table 2
  • In the few-shot setting, GPT-2 trained with CV already outperforms the best CAsT auto runs, pgbert and CFDA
  • The improvement is mainly attributed to better query rewriting: our simple BERT ranker, when using Oracle queries, is less effective than pgbert and CFDA teams’ manual runs; they obtained 0.57+ NDCG@3, compared to ours 0.544 [1]
  • This work demonstrates the effectiveness of GPT-2 for conversational query rewriting
  • Fine-tuned using weak supervision data generated by rules or a handful of manual rewriting labels, our GPT-2 query rewriter is able to create new state-of-the-art on the TREC CAsT conversational search benchmark—outperforming previous methods including query expansion, contextual ranking, and coreference resolution, many of which use large-scale pre-trained models and deep neural networks
方法
  • Method TREC CAsT Auto Runs

    clacBase* pgbert* CFDA_CLIP_RUN7*

    BLEU-2 NDCG@3 QA-ROUGE

    CAsT Queries Original AllenNLP Coref w/o sw AllenNLP Coref w/ sw Oracle

    Zero-Shot Rewriter GPT-2 Raw MARCO Raw Rule-Based

    Few-Shot Rewriter Rule-Based + CV w/o PLM Self-Learn CV Rule-Based + CV Self-Learn + CV

    0.151 0.263 0.280 0.291 0.291 around ten conversational queries.
  • Zero-Shot Rewriter GPT-2 Raw MARCO Raw Rule-Based.
  • 0.151 0.263 0.280 0.291 0.291 around ten conversational queries.
  • The task is to retrieve and rank relevant passages for each query in S from the MS MARCO passage collection and TREC Complex Answer corpora.
  • Standard TREC relevance judgments are provided.
  • CAsT provides official manually rewritten queries for 50 conversational topics [1].
  • The authors manually label answer text for TREC CAsT questions and evaluate question answering result.2
结果
  • The authors evaluate the effectiveness of the query rewriter in conversational search and analyzes the behavior of GPT-2.

    5.1 Conversational Search Accuracy

    The overall Results in TREC CAsT are presented in Table 2.
  • The authors evaluate the effectiveness of the query rewriter in conversational search and analyzes the behavior of GPT-2.
  • In the few-shot setting, GPT-2 trained with CV already outperforms the best CAsT auto runs, pgbert and CFDA.
  • The improvement is mainly attributed to better query rewriting: the simple BERT ranker, when using Oracle queries, is less effective than pgbert and CFDA teams’ manual runs; they obtained 0.57+ NDCG@3, compared to ours 0.544 [1].
  • The authors' query rewriter maintains a stable accuracy in later turns, as shown in Fig. 1b, which indicates that the rewriter effectively captures the multi-turn context as the conversation proceeds
结论
  • This work demonstrates the effectiveness of GPT-2 for conversational query rewriting.
  • Fine-tuned using weak supervision data generated by rules or a handful of manual rewriting labels, the GPT-2 query rewriter is able to create new state-of-the-art on the TREC CAsT conversational search benchmark—outperforming previous methods including query expansion, contextual ranking, and coreference resolution, many of which use large-scale pre-trained models and deep neural networks
总结
  • Introduction:

    Recent advances in deep learning and text understanding facilitate the transition of information retrieval systems from keyword-based queries and “ten-blue” links to more conversational experiences.
  • Q1 Tell the author about the Bronze Age collapse.
  • What is the evidence for the Bronze Age collapse?
  • The possible causes of the Bronze Age collapse?.
  • A signature of Conversational IR is its multi-round interactions with the user, an opportunity to understand and assist with more complex tasks and a challenge to query understanding.
  • The user begins with a fully specified query (Q1), but quickly starts to use references (Q2) and omissions (Q3), which is very different from typical keyword-based search sessions
  • Methods:

    Method TREC CAsT Auto Runs

    clacBase* pgbert* CFDA_CLIP_RUN7*

    BLEU-2 NDCG@3 QA-ROUGE

    CAsT Queries Original AllenNLP Coref w/o sw AllenNLP Coref w/ sw Oracle

    Zero-Shot Rewriter GPT-2 Raw MARCO Raw Rule-Based

    Few-Shot Rewriter Rule-Based + CV w/o PLM Self-Learn CV Rule-Based + CV Self-Learn + CV

    0.151 0.263 0.280 0.291 0.291 around ten conversational queries.
  • Zero-Shot Rewriter GPT-2 Raw MARCO Raw Rule-Based.
  • 0.151 0.263 0.280 0.291 0.291 around ten conversational queries.
  • The task is to retrieve and rank relevant passages for each query in S from the MS MARCO passage collection and TREC Complex Answer corpora.
  • Standard TREC relevance judgments are provided.
  • CAsT provides official manually rewritten queries for 50 conversational topics [1].
  • The authors manually label answer text for TREC CAsT questions and evaluate question answering result.2
  • Results:

    The authors evaluate the effectiveness of the query rewriter in conversational search and analyzes the behavior of GPT-2.

    5.1 Conversational Search Accuracy

    The overall Results in TREC CAsT are presented in Table 2.
  • The authors evaluate the effectiveness of the query rewriter in conversational search and analyzes the behavior of GPT-2.
  • In the few-shot setting, GPT-2 trained with CV already outperforms the best CAsT auto runs, pgbert and CFDA.
  • The improvement is mainly attributed to better query rewriting: the simple BERT ranker, when using Oracle queries, is less effective than pgbert and CFDA teams’ manual runs; they obtained 0.57+ NDCG@3, compared to ours 0.544 [1].
  • The authors' query rewriter maintains a stable accuracy in later turns, as shown in Fig. 1b, which indicates that the rewriter effectively captures the multi-turn context as the conversation proceeds
  • Conclusion:

    This work demonstrates the effectiveness of GPT-2 for conversational query rewriting.
  • Fine-tuned using weak supervision data generated by rules or a handful of manual rewriting labels, the GPT-2 query rewriter is able to create new state-of-the-art on the TREC CAsT conversational search benchmark—outperforming previous methods including query expansion, contextual ranking, and coreference resolution, many of which use large-scale pre-trained models and deep neural networks
表格
  • Table1: A Conversational Search Example in TREC CAsT
  • Table2: Overall Results on TREC CAsT 2019 Conversational Search Task. * marks scores from [<a class="ref-link" id="c1" href="#r1">1</a>]. All our runs use the same ranking model. BLEU-2 are compared with Oracle Queries. QA-ROUGE evaluates the answer quality
  • Table3: GPT-2 Query Rewrites on CAsT Topic 31 and 64
Download tables as Excel
基金
  • This work is supported by the National Key Research and Development Program of China (No 2018YFB1004503) and the National Natural Science Foundation of China (NSFC No 61732008, 61532010)
引用论文
  • Jeff Dalton, Chenyan Xiong, and Jamie Callan. 2019. CAsT 2019: The Conversational Assistance Track Overview. In TREC 2019. NIST.
    Google ScholarFindings
  • Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of NAACL 2019.
    Google ScholarLocate open access versionFindings
  • R. Nogueira and K. Cho. 2019. Passage Re-ranking with BERT. ArXiv abs/1901.04085 (2019).
    Findings
  • Alec Radford, Jeff Wu, Rewon Child, David Luan, Dario Amodei, and Ilya Sutskever. 2019. Language Models are Unsupervised Multitask Learners. (2019).
    Google ScholarFindings
  • Svitlana Vakulenko, Shayne Longpre, Zhucheng Tu, and Raviteja Anantha. 2020. Question Rewriting for Conversational Question Answering. ArXiv abs/2004.14652 (2020).
    Findings
您的评分 :
0

 

标签
评论
数据免责声明
页面数据均来自互联网公开来源、合作出版商和通过AI技术自动分析结果,我们不对页面数据的有效性、准确性、正确性、可靠性、完整性和及时性做出任何承诺和保证。若有疑问,可以通过电子邮件方式联系我们:report@aminer.cn
小科