AI helps you reading Science

AI generates interpretation videos

AI extracts and analyses the key points of the paper to generate videos automatically


pub
Go Generating

AI Traceability

AI parses the academic lineage of this thesis


Master Reading Tree
Generate MRT

AI Insight

AI extracts a summary of this paper


Weibo:
We present a language model based solution instead of traditional SEQ2SEQ paradigm for handling ShortText Conversation

Relevance-Promoting Language Model for Short-Text Conversation

national conference on artificial intelligence, (2020)

Cited by: 1|Views121
Full Text
Bibtex
Weibo

Abstract

Despite the effectiveness of sequence-to-sequence framework on the task of Short-Text Conversation (STC), the issue of under-exploitation of training data (i.e., the supervision signals from query text is \textit{ignored}) still remains unresolved. Also, the adopted \textit{maximization}-based decoding strategies, inclined to generating...More

Code:

Data:

0
Introduction
  • Short Text Conversation (STC) (Shang, Lu, and Li 2015), known as single-turn chit-chat conversation, is a popular research topic in the field of natural language processing
  • It is usually formulated as a sequence translation problem (Ritter, Cherry, and Dolan 2011; Shang, Lu, and Li 2015) and the sequence-to-sequence encoder-decoder (SEQ2SEQ) framework (Cho et al 2014; Sutskever, Vinyals, and Le 2014; Bahdanau, Cho, and Bengio 2015) is applied for solving this problem.
  • The maximization-based decoding strategies adopted in existing models, such as beam search and greedy search, restrict the search space to the most frequent phrases and they have the tendency to generate the generic responses or repetitive responses with unnaturally high likelihood, degrading the conversational experience
Highlights
  • Short Text Conversation (STC) (Shang, Lu, and Li 2015), known as single-turn chit-chat conversation, is a popular research topic in the field of natural language processing
  • In order to exploit such kind of relevance clues hidden behind the responses, we propose a topic inference component to learn a compact source representation encoding the information relevant to the query and feed the query representation into each generation step, encouraging the language model to consider the generation of the topic words potentially related to the query
  • OURS-bm outperforms all compared models on the keyword-overlapping-based HIT metrics, suggesting that our model, armed with Supervised Source Attention component (SSA) and Topic Inference (TI) component, is beneficial for the generation of informative topical words related to the query
  • At the same time, such inconsistency between automatic and human evaluations demonstrates the effectiveness of top-k sampling, a randomization-over-maximization decoding strategy, in discovering infrequent but meaningful patterns for the Short Text Conversation task
  • We present a language model based solution instead of traditional SEQ2SEQ paradigm for handling ShortText Conversation (STC)
  • We propose a relevance-promoting transformer language model to distill the relevance clues from the query as well as the topics inferred from the references, and incorporate them into the generation
Results
  • Main Results Table

    1 and 2 list the automatic evaluation results and the human evaluation results respectively.
  • In terms of BLEU, the proposed model with beam search decoding, namely, OURS-bm, consistently achieve the best scores.
  • OURS-bm, the best model on the automatic relevance metrics, still yields competitive results on the Relevance.
  • It is reasonable because some words not appearing in the query/references, especially those not being frequently used, are still related to the discussed topic in the conversations.
  • At the same time, such inconsistency between automatic and human evaluations demonstrates the effectiveness of top-k sampling, a randomization-over-maximization decoding strategy, in discovering infrequent but meaningful patterns for the STC task
Conclusion
  • The authors present a language model based solution instead of traditional SEQ2SEQ paradigm for handling ShortText Conversation (STC).
  • The authors firstly tailor-make a training strategy to adapt the language model for the STC task.
  • The authors explore the usage of top-k sampling for the STC task to further improve the response diversity.
  • Experimental results on a largescale STC dataset validate that the model is superior to the compared models on both relevance and diversity from automatic and human evaluations
Summary
  • Introduction:

    Short Text Conversation (STC) (Shang, Lu, and Li 2015), known as single-turn chit-chat conversation, is a popular research topic in the field of natural language processing
  • It is usually formulated as a sequence translation problem (Ritter, Cherry, and Dolan 2011; Shang, Lu, and Li 2015) and the sequence-to-sequence encoder-decoder (SEQ2SEQ) framework (Cho et al 2014; Sutskever, Vinyals, and Le 2014; Bahdanau, Cho, and Bengio 2015) is applied for solving this problem.
  • The maximization-based decoding strategies adopted in existing models, such as beam search and greedy search, restrict the search space to the most frequent phrases and they have the tendency to generate the generic responses or repetitive responses with unnaturally high likelihood, degrading the conversational experience
  • Results:

    Main Results Table

    1 and 2 list the automatic evaluation results and the human evaluation results respectively.
  • In terms of BLEU, the proposed model with beam search decoding, namely, OURS-bm, consistently achieve the best scores.
  • OURS-bm, the best model on the automatic relevance metrics, still yields competitive results on the Relevance.
  • It is reasonable because some words not appearing in the query/references, especially those not being frequently used, are still related to the discussed topic in the conversations.
  • At the same time, such inconsistency between automatic and human evaluations demonstrates the effectiveness of top-k sampling, a randomization-over-maximization decoding strategy, in discovering infrequent but meaningful patterns for the STC task
  • Conclusion:

    The authors present a language model based solution instead of traditional SEQ2SEQ paradigm for handling ShortText Conversation (STC).
  • The authors firstly tailor-make a training strategy to adapt the language model for the STC task.
  • The authors explore the usage of top-k sampling for the STC task to further improve the response diversity.
  • Experimental results on a largescale STC dataset validate that the model is superior to the compared models on both relevance and diversity from automatic and human evaluations
Tables
  • Table1: Experimental results on the automatic metrics. The best results are in bold
  • Table2: Human evaluation results with the best ones in bold
  • Table3: Experimental results on the models adopting topk sampling. ∆ refers to the improvement over the original model adopting beam search. The best results are in bold
Download tables as Excel
Related work
Funding
  • ∗The work described in this paper is substantially supported by a grant from the Research Grant Council of the Hong Kong Special Administrative Region, China (Project Code: 14204418)
Reference
  • [Al-Rfou et al. 2019] Al-Rfou, R.; Choe, D.; Constant, N.; Guo, M.; and Jones, L. 2019. Character-level language modeling with deeper self-attention. In AAAI, 3159–3166.
    Google ScholarFindings
  • [Bahdanau, Cho, and Bengio 2015] Bahdanau, D.; Cho, K.; and Bengio, Y. 2015. Neural machine translation by jointly learning to align and translate. In ICLR.
    Google ScholarFindings
  • [Bengio et al. 2003] Bengio, Y.; Ducharme, R.; Vincent, P.; and Jauvin, C. 200A neural probabilistic language model. JMLR 3(Feb):1137–1155.
    Google ScholarLocate open access versionFindings
  • [Budzianowski and Vulic 2019] Budzianowski, P., and Vulic, I. 2019.
    Google ScholarFindings
  • [Cai et al. 2019a] Cai, D.; Wang, Y.; Bi, W.; Tu, Z.; Liu, X.; Lam, W.; and Shi, S. 2019a. Skeleton-to-response: Dialogue generation guided by retrieval memory. In NAACL, 1219–1228.
    Google ScholarFindings
  • [Cai et al. 2019b] Cai, D.; Wang, Y.; Bi, W.; Tu, Z.; Liu, X.; and Shi, S. 2019b. Retrieval-guided dialogue response generation via a matching-to-generation framework. In EMNLP, 1866–1875.
    Google ScholarLocate open access versionFindings
  • [Chen et al. 2019] Chen, C.; Peng, J.; Wang, F.; Xu, J.; and Wu, H. 2019. Generating multiple diverse responses with multi-mapping and posterior mapping selection. arXiv preprint arXiv:1906.01781.
    Findings
  • [Cheng, Dong, and Lapata 2016] Cheng, J.; Dong, L.; and Lapata, M. 2016. Long short-term memory-networks for machine reading. In EMNLP, 551–561.
    Google ScholarLocate open access versionFindings
  • [Cho et al. 2014] Cho, K.; van Merrienboer, B.; Gulcehre, C.; Bahdanau, D.; Bougares, F.; Schwenk, H.; and Bengio, Y. 2014. Learning phrase representations using RNN encoder–decoder for statistical machine translation. In EMNLP, 1724–1734.
    Google ScholarLocate open access versionFindings
  • [Dai et al. 2019] Dai, Z.; Yang, Z.; Yang, Y.; Carbonell, J.; Le, Q.; and Salakhutdinov, R. 2019. Transformer-XL: Attentive language models beyond a fixed-length context. In ACL.
    Google ScholarFindings
  • [Dauphin et al. 2017] Dauphin, Y. N.; Fan, A.; Auli, M.; and Grangier, D. 2017. Language modeling with gated convolutional networks. In ICML.
    Google ScholarFindings
  • [Devlin et al. 2019] Devlin, J.; Chang, M.-W.; Lee, K.; and Toutanova, K. 2019. BERT: Pre-training of deep bidirectional transformers for language understanding. In NAACL, 4171–4186.
    Google ScholarFindings
  • [Dong et al. 2019] Dong, L.; Yang, N.; Wang, W.; Wei, F.; Liu, X.; Wang, Y.; Gao, J.; Zhou, M.; and Hon, H.-W. 2019. Unified language model pre-training for natural language understanding and generation. arXiv preprint arXiv:1905.03197.
    Findings
  • [Du et al. 2018] Du, J.; Li, W.; He, Y.; Xu, R.; Bing, L.; and Wang, X. 2018. Variational autoregressive decoder for neural response generation. In EMNLP, 3154–3163.
    Google ScholarFindings
  • [Fan, Lewis, and Dauphin 2018] Fan, A.; Lewis, M.; and Dauphin, Y. 2018. Hierarchical neural story generation. In ACL, 889–898.
    Google ScholarLocate open access versionFindings
  • [Gao et al. 2019a] Gao, J.; Bi, W.; Liu, X.; Li, J.; and Shi, S. 2019a. Generating multiple diverse responses for short-text conversation. In AAAI.
    Google ScholarFindings
  • [Gao et al. 2019b] Gao, J.; Bi, W.; Liu, X.; Li, J.; Zhou, G.; and Shi, S. 2019b. A discrete CVAE for response generation on short-text conversation. In EMNLP, 1898–1908.
    Google ScholarFindings
  • [Holtzman et al. 2019] Holtzman, A.; Buys, J.; Forbes, M.; and Choi, Y. 2019. The curious case of neural text degeneration. arXiv preprint arXiv:1904.09751.
    Findings
  • [Ippolito et al. 2019] Ippolito, D.; Kriz, R.; Sedoc, J.; Kustikova, M.; and Callison-Burch, C. 20Comparison of diverse decoding methods from conditional language models. In ACL, 3752–3762.
    Google ScholarFindings
  • [Khandelwal et al. 2018] Khandelwal, U.; He, H.; Qi, P.; and Jurafsky, D. 2018. Sharp nearby, fuzzy far away: How neural language models use context. In ACL, 284–294.
    Google ScholarLocate open access versionFindings
  • [Kingma and Ba 2015] Kingma, D. P., and Ba, J. 2015. Adam: A method for stochastic optimization. In ICLR.
    Google ScholarFindings
  • [Li et al. 2016a] Li, J.; Galley, M.; Brockett, C.; Gao, J.; and Dolan, B. 2016a. A diversity-promoting objective function for neural conversation models. In NAACL, 110–119.
    Google ScholarLocate open access versionFindings
  • [Li et al. 2016b] Li, J.; Monroe, W.; Ritter, A.; Jurafsky, D.; Galley, M.; and Gao, J. 2016b. Deep reinforcement learning for dialogue generation. In EMNLP, 1192–1202.
    Google ScholarFindings
  • [Lin et al. 2017] Lin, Z.; Feng, M.; Santos, C. N. d.; Yu, M.; Xiang, B.; Zhou, B.; and Bengio, Y. 2017. A structured self-attentive sentence embedding. In ICLR.
    Google ScholarFindings
  • [Liu et al. 2016] Liu, L.; Utiyama, M.; Finch, A.; and Sumita, E. 2016. Neural machine translation with supervised attention. In COLING.
    Google ScholarFindings
  • [Liu et al. 2018] Liu, Y.; Bi, W.; Gao, J.; Liu, X.; Yao, J.; and Shi, S. 2018. Towards less generic responses in neural conversation models: A statistical re-weighting method. In EMNLP, 2769–2774.
    Google ScholarFindings
  • [Mei, Bansal, and Walter 2017] Mei, H.; Bansal, M.; and Walter, M. R. 2017. Coherent dialogue with attention-based language models. In AAAI.
    Google ScholarFindings
  • [Melis, Dyer, and Blunsom 2018] Melis, G.; Dyer, C.; and Blunsom, P. 2018. On the state of the art of evaluation in neural language models. In ICLR.
    Google ScholarFindings
  • [Merity, Keskar, and Socher 2018] Merity, S.; Keskar, N. S.; and Socher, R. 2018. Regularizing and optimizing lstm language models. In ICLR.
    Google ScholarFindings
  • [Mi, Wang, and Ittycheriah 2016] Mi, H.; Wang, Z.; and Ittycheriah, A. 2016. Supervised attentions for neural machine translation. In EMNLP, 2283–2288.
    Google ScholarLocate open access versionFindings
  • [Mou et al. 2016] Mou, L.; Song, Y.; Yan, R.; Li, G.; Zhang, L.; and Jin, Z. 2016. Sequence to backward and forward sequences: A content-introducing approach to generative short-text conversation. In COLING.
    Google ScholarFindings
  • [Nallapati et al. 2016] Nallapati, R.; Zhou, B.; dos Santos, C.; Gulcehre, C.; and Xiang, B. 2016. Abstractive text summarization using sequence-to-sequence RNNs and beyond. In CoNLL, 280–290.
    Google ScholarLocate open access versionFindings
  • [Olabiyi and Mueller 2019] Olabiyi, O., and Mueller, E. T. 2019. Dlgnet: A transformer-based model for dialogue response generation. arXiv preprint arXiv:1908.01841.
    Findings
  • [Papineni et al. 2002] Papineni, K.; Roukos, S.; Ward, T.; and Zhu, W.-J. 2002. Bleu: a method for automatic evaluation of machine translation. In ACL.
    Google ScholarFindings
  • [Peters et al. 2018] Peters, M.; Neumann, M.; Iyyer, M.; Gardner, M.; Clark, C.; Lee, K.; and Zettlemoyer, L. 2018. Deep contextualized word representations. In NAACL, 2227–2237.
    Google ScholarFindings
  • [Radford et al. 2018] Radford, A.; Narasimhan, K.; Salimans, T.; and Sutskever, I. 2018. Improving language understanding by generative pre-training.
    Google ScholarFindings
  • [Radford et al. 2019] Radford, A.; Wu, J.; Child, R.; Luan, D.; Amodei, D.; and Sutskever, I. 2019. Language models are unsupervised multitask learners. OpenAI Blog 1(8).
    Google ScholarLocate open access versionFindings
  • [Ritter, Cherry, and Dolan 2011] Ritter, A.; Cherry, C.; and Dolan, W. B. 2011. Data-driven response generation in social media. In EMNLP, 583–593.
    Google ScholarLocate open access versionFindings
  • [Serban et al. 2016] Serban, I. V.; Sordoni, A.; Bengio, Y.; Courville, A.; and Pineau, J. 2016. Building end-to-end dialogue systems using generative hierarchical neural network models. In AAAI.
    Google ScholarFindings
  • [Shang, Lu, and Li 2015] Shang, L.; Lu, Z.; and Li, H. 2015. Neural responding machine for short-text conversation. In ACL, 1577– 1586.
    Google ScholarLocate open access versionFindings
  • [Sutskever, Vinyals, and Le 2014] Sutskever, I.; Vinyals, O.; and Le, Q. V. 2014. Sequence to sequence learning with neural networks. In NeurIPS, 3104–3112.
    Google ScholarLocate open access versionFindings
  • [Tian et al. 2019] Tian, Z.; Bi, W.; Li, X.; and Zhang, N. L. 2019. Learning to abstract for memory-augmented conversational response generation. In ACL, 3816–3825.
    Google ScholarLocate open access versionFindings
  • [Vaswani et al. 2017] Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A. N.; Kaiser, Ł.; and Polosukhin, I. 2017. Attention is all you need. In NeurIPS, 5998–6008.
    Google ScholarLocate open access versionFindings
  • [Wu et al. 2019] Wu, Y.; Wei, F.; Huang, S.; Wang, Y.; Li, Z.; and Zhou, M. 2019. Response generation by context-aware prototype editing. In AAAI.
    Google ScholarFindings
  • [Xing et al. 2017] Xing, C.; Wu, W.; Wu, Y.; Liu, J.; Huang, Y.; Zhou, M.; and Ma, W.-Y. 2017. Topic aware neural response generation. In AAAI.
    Google ScholarFindings
  • [Xu et al. 2017] Xu, Z.; Liu, B.; Wang, B.; SUN, C.; Wang, X.; Wang, Z.; and Qi, C. 2017. Neural response generation via gan with an approximate embedding layer. In EMNLP, 617–626.
    Google ScholarLocate open access versionFindings
  • [Yao et al. 2017] Yao, L.; Zhang, Y.; Feng, Y.; Zhao, D.; and Yan, R. 2017. Towards implicit content-introducing for generative shorttext conversation systems. In EMNLP, 2190–2199.
    Google ScholarFindings
  • [Yu et al. 2017] Yu, L.; Blunsom, P.; Dyer, C.; Grefenstette, E.; and Kocisky, T. 2017. The neural noisy channel. In ICLR.
    Google ScholarFindings
  • [Zhang et al. 2019] Zhang, Y.; Sun, S.; Galley, M.; Chen, Y.-C.; Brockett, C.; Gao, X.; Gao, J.; Liu, J.; and Dolan, B. 2019. Dialogpt: Large-scale generative pre-training for conversational response generation. arXiv preprint arXiv:1911.00536.
    Findings
  • [Zhao, Zhao, and Eskenazi 2017] Zhao, T.; Zhao, R.; and Eskenazi, M. 2017. Learning discourse-level diversity for neural dialog models using conditional variational autoencoders. In ACL, 654–664.
    Google ScholarLocate open access versionFindings
  • [Zhou et al. 2017] Zhou, G.; Luo, P.; Cao, R.; Lin, F.; Chen, B.; and He, Q. 2017. Mechanism-aware neural machine for dialogue response generation. In AAAI.
    Google ScholarFindings
Your rating :
0

 

Tags
Comments
数据免责声明
页面数据均来自互联网公开来源、合作出版商和通过AI技术自动分析结果,我们不对页面数据的有效性、准确性、正确性、可靠性、完整性和及时性做出任何承诺和保证。若有疑问,可以通过电子邮件方式联系我们:report@aminer.cn
小科