The Dialogue Dodecathlon: Open-Domain Knowledge and Image Grounded Conversational Agents

ACL, pp. 2453-2470, 2020.

Cited by: 4|Bibtex|Views152
EI
Other Links: arxiv.org|dblp.uni-trier.de|academic.microsoft.com
Weibo:
We have introduced the dodecaDialogue task, and provide strong baseline results leveraging multimodal Image+Seq2Seq transformers trained across all tasks

Abstract:

We introduce dodecaDialogue: a set of 12 tasks that measures if a conversational agent can communicate engagingly with personality and empathy, ask questions, answer questions by utilizing knowledge resources, discuss topics and situations, and perceive and converse about images. By multi-tasking on such a broad large-scale set of data,...More
0
Introduction
  • One of the goals of AI is to build a seeing, talking agent that can discuss, reason, empathize, and provide advice – in short a system that can perform natural communication displaying many of the properties expected when speaking to a human partner.
  • While no single task exists that can train an agent or measure its ability on all of these axes at once, a number of distinct large-scale datasets targeting subsets of these skills have recently become available.
  • The authors assemble these disparate tasks to form a single challenge: dodecaDialogue, consisting of 12 subtasks.
  • As some of the subtasks have very large datasets, e.g. 2.2 billion utterances, they can possibly help the agent with other skills too
Highlights
  • One of the goals of AI is to build a seeing, talking agent that can discuss, reason, empathize, and provide advice – in short a system that can perform natural communication displaying many of the properties expected when speaking to a human partner
  • We show the effect of these choices in Table 6 for ConvAI2 and Wizard of Wikipedia (WoW)
  • The results, given in Figure 1, show our method outperforming the existing state of the art generative models on all three comparisons: Image Chat, Wizard of Wikipedia seen topics and Wizard of Wikipedia unseen topics
  • We have introduced the dodecaDialogue task, and provide strong baseline results leveraging multimodal Image+Seq2Seq transformers trained across all tasks
Methods
  • Human Performance (Dinan et al, 2019) Image+Seq2Seq (All Tasks MT) Image+Seq2Seq (All Tasks MT).
  • C All Tasks Multi-Task Model Examples Context.
  • The author is very strong for the author's age.
  • The author likes wine and dancing too !
  • The author is very strong for the author's age.
  • Chosen Topic: The Rolling Stones Knowledge: no passages used.
  • Chosen Topic: Dog Knowledge: The dog was the first species to be domesticated and has been selectively bred over millennia for various behaviors, sensory capabilities, and physical attributes.
  • Speaker A: Speaker B: Speaker A: Speaker B: Speaker A: hi , how are you doing today ? good ! you ! celebrating with fellow centenarians nice . i’m playing some card games with the author's family . that sounds like fun . i like wine and dancing too ! same . i try to get a small workout in a three mile walk for the author is key .
Results
  • Metrics For all tasks, the authors use the following metrics: perplexity (PPL), BLEU, ROUGE-1,-2 and -L and F1, and pick the metric most used in the literature as that subtask’s ‘Score’ to compare to existing work.

    Multi-tasking As the authors are interested in building a single conversational agent, the authors measure the ability of multi-tasked models that can perform all twelve tasks at once.

    Single-Task Fine-tuning The authors can still compare such multi-tasked models to single-task fine-tuned baselines to assess if the authors have gained or lost performance.
  • Like other works (Liu et al, 2015; Raffel et al, 2019) the authors consider a multi-task followed by finetune setup in order to see if this produces better models.
  • The latter tests if multi-tasking still proves useful in the single-task setting.
  • This evaluates the performance on truly new unseen tasks, an important behavior given there are always new tasks
Conclusion
  • The authors have introduced the dodecaDialogue task, and provide strong baseline results leveraging multimodal Image+Seq2Seq transformers trained across all tasks.
  • The goal of introducing this task is not just as another challenge dataset, but to further motivate building and evaluating conversational agents capable of multiple skills – one of the core goals of AI.
  • Reported results show systems can be reasonably competitive compared to humans in particular domains for short conversations (Li et al, 2019b; Shuster et al, 2018).
  • This work tries to bridge the gap to avoid agents with niche skills, to move towards evaluating an open-domain set of skills.
  • Future work should consider adding these tasks to the ones used here, while continuing the quest for improved models
Summary
  • Introduction:

    One of the goals of AI is to build a seeing, talking agent that can discuss, reason, empathize, and provide advice – in short a system that can perform natural communication displaying many of the properties expected when speaking to a human partner.
  • While no single task exists that can train an agent or measure its ability on all of these axes at once, a number of distinct large-scale datasets targeting subsets of these skills have recently become available.
  • The authors assemble these disparate tasks to form a single challenge: dodecaDialogue, consisting of 12 subtasks.
  • As some of the subtasks have very large datasets, e.g. 2.2 billion utterances, they can possibly help the agent with other skills too
  • Methods:

    Human Performance (Dinan et al, 2019) Image+Seq2Seq (All Tasks MT) Image+Seq2Seq (All Tasks MT).
  • C All Tasks Multi-Task Model Examples Context.
  • The author is very strong for the author's age.
  • The author likes wine and dancing too !
  • The author is very strong for the author's age.
  • Chosen Topic: The Rolling Stones Knowledge: no passages used.
  • Chosen Topic: Dog Knowledge: The dog was the first species to be domesticated and has been selectively bred over millennia for various behaviors, sensory capabilities, and physical attributes.
  • Speaker A: Speaker B: Speaker A: Speaker B: Speaker A: hi , how are you doing today ? good ! you ! celebrating with fellow centenarians nice . i’m playing some card games with the author's family . that sounds like fun . i like wine and dancing too ! same . i try to get a small workout in a three mile walk for the author is key .
  • Results:

    Metrics For all tasks, the authors use the following metrics: perplexity (PPL), BLEU, ROUGE-1,-2 and -L and F1, and pick the metric most used in the literature as that subtask’s ‘Score’ to compare to existing work.

    Multi-tasking As the authors are interested in building a single conversational agent, the authors measure the ability of multi-tasked models that can perform all twelve tasks at once.

    Single-Task Fine-tuning The authors can still compare such multi-tasked models to single-task fine-tuned baselines to assess if the authors have gained or lost performance.
  • Like other works (Liu et al, 2015; Raffel et al, 2019) the authors consider a multi-task followed by finetune setup in order to see if this produces better models.
  • The latter tests if multi-tasking still proves useful in the single-task setting.
  • This evaluates the performance on truly new unseen tasks, an important behavior given there are always new tasks
  • Conclusion:

    The authors have introduced the dodecaDialogue task, and provide strong baseline results leveraging multimodal Image+Seq2Seq transformers trained across all tasks.
  • The goal of introducing this task is not just as another challenge dataset, but to further motivate building and evaluating conversational agents capable of multiple skills – one of the core goals of AI.
  • Reported results show systems can be reasonably competitive compared to humans in particular domains for short conversations (Li et al, 2019b; Shuster et al, 2018).
  • This work tries to bridge the gap to avoid agents with niche skills, to move towards evaluating an open-domain set of skills.
  • Future work should consider adding these tasks to the ones used here, while continuing the quest for improved models
Tables
  • Table1: The 12 dodecaDialogue subtasks, their sizes (number of train, valid, test utterances), and average number of turns and response length (words)
  • Table2: Validation perplexity for the dodecaDialogue tasks in various settings
  • Table3: Transfer performance of various multi-task models (validation perplexity)
  • Table4: The impact of knowledge and image grounding in dodecaDialogue (validation perplexity)
  • Table5: Validation perplexity on select dodecaDialogue tasks comparing relative weights of tasks during multi-tasking, followed by fine-tuning (row below). The relative task weight is the ratio of examples from that task compared to others presented during multitasking. ∞ indicates single-task training
  • Table6: Impact of the decoding strategy on select tasks, reporting validation F1 score for the All Tasks MT model. N-gram block is for best beam size
  • Table7: Test performance for various metrics on the dodecaDialogue tasks comparing our multi-task and multitask + fine-tuned methods to existing approaches (cited). Dashes mean metric was not provided. ∗ was reported on validation only. Score is defined on a per-task basis in the metric column
  • Table8: Test performance for various metrics on the dodecaDialogue tasks comparing our multi-task and multitask + fine-tuned methods
  • Table9: Validation performance for various metrics on the dodecaDialogue tasks comparing our multi-task and multi-task + fine-tuned methods
  • Table10: All Tasks Multi-Tasking (MT) validation performance for various metrics on the dodecaDialogue tasks with one set of decoding parameters: a beam size of 3, minimum response length of 10, and blocking repeated tri-grams
  • Table11: Best decoding parameters for each task, based on metric. Scores are from the best performing taskspecific multi-task + fine-tuned model on validation sets. ”Min L” and ”Max L” refer to the minimum and maximum decoding length, where ”L” is the number of tokens
  • Table12: Human evaluations on Image Chat, comparing various decoding schemes for our Image+Seq2Seq model trained on all tasks MT, as well as comparisons with human outputs. Scores with ∗ are statistically significant
  • Table13: Human evaluations on Wizard of Wikipedia (seen) test set, comparing various decoding schemes for our Image+Seq2Seq model trained on all tasks MT, as well as comparisons with human outputs, using ACUTE-Eval. All scores are statistically significant (binomial test, p < .05)
  • Table14: Human evaluations on Wizard of Wikipedia (unseen) test set, comparing various decoding schemes for our Image+Seq2Seq model trained on all tasks MT, as well as comparisons with human outputs, using ACUTEEval. All scores are statistically significant (binomial test, p < .05)
  • Table15: Human evaluations on Wizard of Wikipedia, comparing various decoding schemes for our Image+Seq2Seq model trained on all tasks MT, as well as comparisons with human outputs, in terms of Likert Scores. Ratings are reported as mean (stddev)
Download tables as Excel
Related work
  • 3.1 Existing Models and Results

    Where possible, we have tried to track the best existing results for each task and provided a comparison in our final results table.

    As ConvAI2 was a competition, a number of competitors built strong models on it. The best results were obtained by large pre-trained transformers (Dinan et al, 2020). In particular, Wolf et al (2019b) pre-trained via the method of Radford et al (2018) using the BooksCorpus dataset, resulting in the best perplexities and F1 scores. Since then, results have gotten even better with the advent of better and larger pretraining (Lewis et al, 2019), which we compare to here; the same work also reports strong results on ELI5.

    He et al (2019) recently obtained strong results on the DailyDialog and Cornell Movie tasks in terms of perplexity by pre-training on 10% of CCNEWS (Bakhtin et al, 2019), thus using 100 million sentences (2.7 billion words) and then finetuning a transformer based model with a multi-task strategy.
Reference
  • Anton Bakhtin, Sam Gross, Myle Ott, Yuntian Deng, Marc’Aurelio Ranzato, and Arthur Szlam. 2019. Real or fake? learning to discriminate machine from human generated text. arXiv preprint arXiv:1906.03351.
    Findings
  • Eunsol Choi, He He, Mohit Iyyer, Mark Yatskar, Wentau Yih, Yejin Choi, Percy Liang, and Luke Zettlemoyer. 2018. QuAC: Question answering in context. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 2174–2184, Brussels, Belgium. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Cristian Danescu-Niculescu-Mizil and Lillian Lee. 2011. Chameleons in imagined conversations: A new approach to understanding coordination of linguistic style in dialogs. In Proceedings of the 2nd Workshop on Cognitive Modeling and Computational Linguistics, Portland, Oregon, USA. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Abhishek Das, Satwik Kottur, Khushi Gupta, Avi Singh, Deshraj Yadav, Jose MF Moura, Devi Parikh, and Dhruv Batra. 2017. Visual dialog. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 326–335.
    Google ScholarLocate open access versionFindings
  • Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 4171–4186, Minneapolis, Minnesota. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Emily Dinan, Varvara Logacheva, Valentin Malykh, Alexander Miller, Kurt Shuster, Jack Urbanek, Douwe Kiela, Arthur Szlam, Iulian Serban, Ryan Lowe, Shrimai Prabhumoye, Alan W. Black, Alexander Rudnicky, Jason Williams, Joelle Pineau, Mikhail Burtsev, and Jason Weston. 2020. The second conversational intelligence challenge (ConvAI2). In The NeurIPS ’18 Competition, pages 187– 208, Cham. Springer International Publishing.
    Google ScholarLocate open access versionFindings
  • Emily Dinan, Stephen Roller, Kurt Shuster, Angela Fan, Michael Auli, and Jason Weston. 2019. Wizard of wikipedia: Knowledge-powered conversational agents. In Proceedings of the International Conference on Learning Representations.
    Google ScholarLocate open access versionFindings
  • Angela Fan, Yacine Jernite, Ethan Perez, David Grangier, Jason Weston, and Michael Auli. 2019. ELI5: Long form question answering. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 3558–3567, Florence, Italy. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Karthik Gopalakrishnan, Behnam Hedayatnia, Qinlang Chen, Anna Gottardi, Sanjeev Kwatra, Anu Venkatesh, Raefer Gabriel, and Dilek HakkaniTr. 201Topical-Chat: Towards KnowledgeGrounded Open-Domain Conversations. In Proc. Interspeech 2019, pages 1891–1895.
    Google ScholarLocate open access versionFindings
  • Tianxing He, Jun Liu, Kyunghyun Cho, Myle Ott, Bing Liu, James Glass, and Fuchun Peng. 2019. Mixreview: Alleviate forgetting in the pretrain-finetune framework for neural language generation models. arXiv preprint arXiv:1910.07117.
    Findings
  • Ari Holtzman, Jan Buys, Maxwell Forbes, and Yejin Choi. 2019. The curious case of neural text degeneration. arXiv preprint arXiv:1904.09751.
    Findings
  • Samuel Humeau, Kurt Shuster, Marie-Anne Lachaux, and Jason Weston. 2019. Poly-encoders: Transformer architectures and pre-training strategies for fast and accurate multi-sentence scoring. arXiv preprint arXiv:1905.01969.
    Findings
  • Armand Joulin, Edouard Grave, Piotr Bojanowski, and Tomas Mikolov. 2017. Bag of tricks for efficient text classification. In Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 2, Short Papers, pages 427–431, Valencia, Spain. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Nitish Shirish Keskar, Bryan McCann, Lav R Varshney, Caiming Xiong, and Richard Socher. 2019. CTRL: A conditional transformer language model for controllable generation. arXiv preprint arXiv:1909.05858.
    Findings
  • Mike Lewis, Yinhan Liu, Naman Goyal, Marjan Ghazvininejad, Abdelrahman Mohamed, Omer Levy, Ves Stoyanov, and Luke Zettlemoyer. 2019. BART: Denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension. arXiv preprint arXiv:1910.13461.
    Findings
  • Liunian Harold Li, Mark Yatskar, Da Yin, Cho-Jui Hsieh, and Kai-Wei Chang. 2019a. VisualBERT: A simple and performant baseline for vision and language. arXiv preprint arXiv:1908.03557.
    Findings
  • Margaret Li, Jason Weston, and Stephen Roller. 2019b. ACUTE-EVAL: Improved dialogue evaluation with optimized questions and multi-turn comparisons. In Proceedings of the NeurIPS Workshop on Conversational AI.
    Google ScholarLocate open access versionFindings
  • Yanran Li, Hui Su, Xiaoyu Shen, Wenjie Li, Ziqiang Cao, and Shuzi Niu. 2017. DailyDialog: A manually labelled multi-turn dialogue dataset. In Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 986–995, Taipei, Taiwan. Asian Federation of Natural Language Processing.
    Google ScholarLocate open access versionFindings
  • Xiaodong Liu, Jianfeng Gao, Xiaodong He, Li Deng, Kevin Duh, and Ye-yi Wang. 2015. Representation learning using multi-task deep neural networks for semantic classification and information retrieval. In Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 912–921, Denver, Colorado. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Ryan Lowe, Nissan Pow, Iulian Serban, and Joelle Pineau. 2015. The Ubuntu dialogue corpus: A large dataset for research in unstructured multi-turn dialogue systems. In Proceedings of the 16th Annual Meeting of the Special Interest Group on Discourse and Dialogue, pages 285–294, Prague, Czech Republic. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Jiasen Lu, Dhruv Batra, Devi Parikh, and Stefan Lee. 2019. ViLBERT: Pretraining task-agnostic visiolinguistic representations for vision-and-language tasks. arXiv preprint arXiv:1908.02265.
    Findings
  • Yi Luan, Yangfeng Ji, and Mari Ostendorf. 2016. LSTM based conversation models. arXiv preprint arXiv:1603.09457.
    Findings
  • Dhruv Mahajan, Ross Girshick, Vignesh Ramanathan, Kaiming He, Manohar Paluri, Yixuan Li, Ashwin Bharambe, and Laurens van der Maaten. 2018. Exploring the limits of weakly supervised pretraining. In Proceedings of the European Conference on Computer Vision, pages 185–201, Cham. Springer International Publishing.
    Google ScholarLocate open access versionFindings
  • Pierre-Emmanuel Mazare, Samuel Humeau, Martin Raison, and Antoine Bordes. 2018. Training millions of personalized dialogue agents. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 2775–2779, Brussels, Belgium. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Bryan McCann, Nitish Shirish Keskar, Caiming Xiong, and Richard Socher. 2018. The natural language decathlon: Multitask learning as question answering. arXiv preprint arXiv:1806.08730.
    Findings
  • Alexander Miller, Will Feng, Dhruv Batra, Antoine Bordes, Adam Fisch, Jiasen Lu, Devi Parikh, and Jason Weston. 2017. ParlAI: A dialog research software platform. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, pages 79–84, Copenhagen, Denmark. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Nikita Moghe, Siddhartha Arora, Suman Banerjee, and Mitesh M. Khapra. 2018. Towards exploiting background knowledge for building conversation systems. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 2322–2332, Brussels, Belgium. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Seungwhan Moon, Pararth Shah, Rajen Subba, and Anuj Kumar. 2019. Memory grounded conversational reasoning. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLPIJCNLP): System Demonstrations, pages 145–150, Hong Kong, China. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Nasrin Mostafazadeh, Chris Brockett, Bill Dolan, Michel Galley, Jianfeng Gao, Georgios Spithourakis, and Lucy Vanderwende. 2017. Image-grounded conversations: Multimodal context for natural question and response generation. In Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 462–472, Taipei, Taiwan. Asian Federation of Natural Language Processing.
    Google ScholarLocate open access versionFindings
  • Romain Paulus, Caiming Xiong, and Richard Socher. 2018. A deep reinforced model for abstractive summarization. In Proceedings of the International Conference on Learning Representations.
    Google ScholarLocate open access versionFindings
  • Lianhui Qin, Michel Galley, Chris Brockett, Xiaodong Liu, Xiang Gao, Bill Dolan, Yejin Choi, and Jianfeng Gao. 2019. Conversing by reading: Contentful neural conversation with on-demand machine reading. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 5427–5436, Florence, Italy. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Alec Radford, Karthik Narasimhan, Tim Salimans, and Ilya Sutskever. 2018. Improving language understanding by generative pre-training.
    Google ScholarFindings
  • Alec Radford, Jeffrey Wu, Rewon Child, David Luan, Dario Amodei, and Ilya Sutskever. 2019. Language models are unsupervised multitask learners. OpenAI Blog, 1(8).
    Google ScholarLocate open access versionFindings
  • Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, and Peter J Liu. 2019. Exploring the limits of transfer learning with a unified text-to-text transformer. arXiv preprint arXiv:1910.10683.
    Findings
  • Pranav Rajpurkar, Jian Zhang, Konstantin Lopyrev, and Percy Liang. 2016. SQuAD: 100,000+ questions for machine comprehension of text. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, pages 2383–2392, Austin, Texas. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Hannah Rashkin, Eric Michael Smith, Margaret Li, and Y-Lan Boureau. 2019. Towards empathetic opendomain conversation models: A new benchmark and dataset. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 5370–5381, Florence, Italy. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Abigail See, Stephen Roller, Douwe Kiela, and Jason Weston. 2019. What makes a good conversation? how controllable attributes affect human judgments. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 1702–1723, Minneapolis, Minnesota. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Kurt Shuster, Samuel Humeau, Antoine Bordes, and Jason Weston. 2018. Engaging image chat: Modeling personality in grounded dialogue. arXiv preprint arXiv:1811.00945.
    Findings
  • Alessandro Sordoni, Michel Galley, Michael Auli, Chris Brockett, Yangfeng Ji, Margaret Mitchell, Jian-Yun Nie, Jianfeng Gao, and Bill Dolan. 2015. A neural network approach to context-sensitive generation of conversational responses. In Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 196– 205, Denver, Colorado. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Alon Talmor and Jonathan Berant. 2019. MultiQA: An empirical investigation of generalization and transfer in reading comprehension. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 4911–4921, Florence, Italy. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Hao Tan and Mohit Bansal. 2019. LXMERT: Learning cross-modality encoder representations from transformers. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 5099–5110, Hong Kong, China. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Jack Urbanek, Angela Fan, Siddharth Karamcheti, Saachi Jain, Samuel Humeau, Emily Dinan, Tim Rocktaschel, Douwe Kiela, Arthur Szlam, and Jason Weston. 2019. Learning to speak and act in a fantasy text adventure game. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 673–683, Hong
    Google ScholarLocate open access versionFindings
  • Kong, China. Association for Computational Linguistics.
    Google ScholarFindings
  • Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Ł ukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In Advances in Neural Information Processing Systems 30, pages 5998–6008. Curran Associates, Inc.
    Google ScholarLocate open access versionFindings
  • Thomas Wolf, Lysandre Debut, Victor Sanh, Julien Chaumond, Clement Delangue, Anthony Moi, Pierric Cistac, Tim Rault, Rmi Louf, Morgan Funtowicz, and Jamie Brew. 2019a. HuggingFace’s transformers: State-of-the-art natural language processing. arXiv preprint arXiv:1910.03771.
    Findings
  • Thomas Wolf, Victor Sanh, Julien Chaumond, and Clement Delangue. 2019b. TransferTransfo: A transfer learning approach for neural network based conversational agents. arXiv preprint arXiv:1901.08149.
    Findings
  • Saining Xie, Ross Girshick, Piotr Dollar, Zhuowen Tu, and Kaiming He. 2017. Aggregated residual transformations for deep neural networks. In Computer Vision and Pattern Recognition (CVPR).
    Google ScholarLocate open access versionFindings
  • Yinfei Yang, Steve Yuan, Daniel Cer, Sheng-yi Kong, Noah Constant, Petr Pilar, Heming Ge, Yun-Hsuan Sung, Brian Strope, and Ray Kurzweil. 2018. Learning semantic textual similarity from conversations. In Proceedings of The Third Workshop on Representation Learning for NLP, pages 164–174, Melbourne, Australia. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Saizheng Zhang, Emily Dinan, Jack Urbanek, Arthur Szlam, Douwe Kiela, and Jason Weston. 2018. Personalizing dialogue agents: I have a dog, do you have pets too? In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 2204– 2213, Melbourne, Australia. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
Full Text
Your rating :
0

 

Tags
Comments