This paper provides an extensive survey of currently available datasets suitable for research, development, and evaluation of data-driven dialogue systems
A Survey of Available Corpora For Building Data-Driven Dialogue Systems: The Journal Version.
D&D, no. 1 (2018): 1-49
下载 PDF 全文
During the past decade, several areas of speech and language understanding have witnessed substantial breakthroughs from the use of data-driven models. In the area of dialogue systems, the trend is less obvious, and most practical systems are still built through significant engineering and expert knowledge. Nevertheless, several recent re...更多
下载 PDF 全文
- Known as interactive conversational agents, virtual agents or sometimes chatterbots, are useful in a wide range of applications ranging from technical support services to language learning tools and entertainment (Young et al, 2013; Shawar and Atwell, 2007b).
- The use of machine learning methods — such as neural networks — require an understanding of the availability, requirements, and uses of available dialogue corpora.
- To this end, this paper presents a broad survey of available dialogue corpora
- Dialogue systems, also known as interactive conversational agents, virtual agents or sometimes chatterbots, are useful in a wide range of applications ranging from technical support services to language learning tools and entertainment (Young et al, 2013; Shawar and Atwell, 2007b)
- A wide range of datadriven machine learning methods have been shown to be effective for natural language processing, including tasks relevant to dialogue, such as dialogue act classification (Reithinger and Klesen, 1997; Stolcke et al, 2000), dialogue state tracking (Thomson and Young, 2010; Wang and Lemon, 2013; Ren et al, 2013; Henderson et al, 2013; Williams et al, 2013; Henderson et al, 2014c; Kim et al, 2015), natural language generation (Langkilde and Knight, 1998; Oh and Rudnicky, 2000; Walker et al, 2002; Ratnaparkhi, 2002; Stent et al, 2004; Rieser and Lemon, 2010; Mairesse et al, 2010; Mairesse and Young, 2014; Wen et al, 2015a; Sharma et al, 2016), and dialogue policy learning (Young et al, 2013)
- This paper provides an extensive survey of currently available datasets suitable for research, development, and evaluation of data-driven dialogue systems
- Neural networks can be applied to narrow domains, such as restaurant recommendation, with relatively little data (Wen et al, 2017)
- To obtain reasonable results in such a setting, neural network practitioners have resorted to training neural network models on datasets with hundreds of thousands to millions of dialogues: the Twitter Corpus (Ritter et al, 2010; Sordoni et al, 2015), Reddit, the Ubuntu Dialogue Corpus (Lowe et al, 2015a), and various movie subtitle datasets such as SubTle, OpenSubtitles, Movie-DiC, and the Movie Dialogue Dataset (Ameixa and Coheur, 2013; Tiedemann, 2012; Banchs, 2012; Dodge et al, 2015)
- While the conversation topics in these datasets often vary considerably, the nature of the datasets themselves are fairly fixed in the form of informal written dialogues between humans. This is the case for movie scripts, forum posts, and micro-blogging platforms. Learning only from these sources will bias dialogue systems towards certain kinds of interactions and behaviours; for example, written corpora usually have a specific turn-taking structure that is different from spoken conversation, and they may encode biases against certain groups or populations (Henderson et al, 2017)
- Processing (EMNLP), 2011.
- S. Rosenthal and K.
- The author couldnts agree more: The role of conversational structure in agreement and disagreement detection in online discussions.
- In Special Interest Group on Discourse and Dialogue (SIGDIAL), 2015.
- S. Rosset and S.
- The ritel corpus-an annotated human-machine open-domain question answering spoken dialog corpus.
- In The International Conference on Language Resources and Evaluation (LREC), 2006.
- A. Roy, C.
- Guinaudeau, H.
- TVD: a reproducible and multiply aligned tv series dataset.
- Evaluation metrics
One of the most challenging aspects of constructing dialogue systems lies in their evaluation.
- Often it is necessary to optimize performance on a pseudo-performance metric prior to release
- This is true if a dialogue model has many hyper-parameters to be optimized — it is infeasible to run user experiments for every parameter setting in a grid search.
- The authors would have some automated metrics for calculating a score for each model, and only involve human evaluators once the best model has been chosen with reasonable confidence
- The authors discuss a number of challenges and general methods related to the development and evaluation of data-driven dialogue systems.
- Evaluating a data-driven dialogue system properly is critical for real-world deployments as well as for advancing state-of-the-art research, in which case reproducibility of methods and results is crucial.This paper provides an extensive survey of currently available datasets suitable for research, development, and evaluation of data-driven dialogue systems
- The authors categorize these corpora along several dimensions depending on whether the dataset is written or spoken, between human interlocutors or human-machine conversations, and constrained in topic or more free-form.
- There is a lack of large-scale multi-modal datasets, which may be crucial towards grounding the language learned by the dialogue agents in human-like experience
- Table1: Human-machine dialogue datasets. Starred (*) numbers are approximated based on the average number of words per utterance. Datasets marked with (†) indicate Wizard-of-Oz dialogues, where the machine is secretly operated by a human
- Table2: Human-human spontaneous spoken dialogue datasets. Starred (*) numbers are estimates based on the average rate of English speech from the National Center for Voice and Speech (www.ncvs.org/ncvs/tutorials/voiceprod/tutorial/quality.html)
- Table3: Human-human constrained spoken dialogue datasets. Starred (*) numbers are estimates based on the average rate of English speech from the National Center for Voice and Speech (www.ncvs.org/ncvs/tutorials/voiceprod/tutorial/quality.html)
- Table4: Human-human scripted dialogue datasets. Quantities denoted with (†) indicate estimates based on average number of dialogues per movie (<a class="ref-link" id="cBanchs_2012_a" href="#rBanchs_2012_a">Banchs, 2012</a>) and the number of scripts or works in the corpus. Dialogues may not be explicitly separated in these datasets. TV show datasets were adjusted based on the ratio of average film runtime (112 minutes) to average TV show runtime (36 minutes). This data was scraped from the IMBD database (http://www.imdb.com/interfaces). ( Starred (*) quantities are estimated based on the average number of words and utterances per film, and the average lengths of films and TV shows. Estimates derived from the Tameri Guide for Writers (http://www.tameri.com/format/wordcounts.html)
- Table5: Human-human written dialogue datasets. Starred (*) quantities are computed using word counts based on spaces. Triangle ( ) indicates lower and upper bounds computed using average words per utterance estimated on a similar Reddit corpus Schrading (2015). Square (2) indicates estimates based only on the English part of the corpus. Dialogues indicated by (†) are contiguous blocks of recorded conversation in a multiparticipant chat. For UseNet, the average number of turns are calculated as the average number of posts collected per newsgroup. (‡) indicates an estimate based on a Twitter dataset of similar size and refers to tokens as well as words
- The authors gratefully acknowledge financial support by the Samsung Advanced Institute of Technology (SAIT), the Natural Sciences and Engineering Research Council of Canada (NSERC), the Canada Research Chairs, the Canadian Institute for Advanced Research (CIFAR) and Compute Canada
- The second author is funded by a Vanier Graduate Scholarship
Human-Machine Corpora. Another salient distinction between dialogue datasets resides in the types of interlocutors — notably, whether it involves interactions between two humans, or between a human and a computer.2. The distinction is important because current artificial dialogue systems are significantly constrained
small datasets with the same number of dialogues in the: 2
The second comes from a statistical natural language processing perspective: since the statistical complexity of a corpus grows with the linguistic diversity and number of topics, the number of examples required by a machine learning algorithm to model the patterns in it will also grow with the linguistic diversity and number of topics. Consider two small datasets with the same number of dialogues in the domain of bus schedule information: in one dataset the conversations between the users and operator is natural, and the operator can improvise and chitchat; in the other dataset, the operator reads from a script to provide the bus information. Despite having the same size, the second dataset will have less linguistic diversity and not include chitchat topics
4.1 Human-Machine Corpora. As discussed in Subsection 3.2, an important distinction between dialogue datasets is whether they consist of dialogues between two humans or between a human and a machine. Thus, we begin by outlining some of the existing human-machine corpora in several categories based on the types of systems the humans interact with: Restaurant and Travel Information, Open-Domain Knowledge Retrieval, and Other Specialized systems
That is, to form a coherent dialogue, previous contexts must be accounted for – either explicitly or in an end-to-end manner. As such, the first three datasets in the DSTC — referred to as DSTC1, DSTC2, and DSTC3 respectively — are medium-sized spoken datasets obtained from human-machine interactions with restaurant and travel information systems. All datasets provide labels specifying the current goal and desired action of the system
young and old adults: 50
The latter was transcribed and annotated with simple speech acts such as “signaling emotions” or “self-addressing”. The MATCH corpus (Georgila et al, 2010) is a small corpus of 447 dialogues based on a Wizard-of-Oz experiment, which contains conversations from 50 young and old adults interacting with spoken dialogue systems. These conversations were annotated semi-automatically with dialogue acts and “Information State Update” (ISU) representations of dialogue context
This dataset consists of approximately 2,500 dialogues from phone calls, along with wordby-word transcriptions, with about 500 different speakers. A computer-driven robot operator system introduced a topic for discussion between two participants, and recorded the resulting conversation. About 70 casual topics were provided, of which about 50 were frequently used
While the original dataset featured 2D visual feeds, an updated version with 3D video has also been derived, called the 4D Cardiff Conversation Database (4D CCDb) (Vandeventer et al, 2015). This version contains 17 one-minute conversations from 4 participants on similarly un-constrained topics. The Diachronic Corpus of Present-Day Spoken English (DCPSE) (Aarts and Wallis, 2006) is a parsed corpus of spoken English made up of two separate datasets
separate datasets: 2
This version contains 17 one-minute conversations from 4 participants on similarly un-constrained topics. The Diachronic Corpus of Present-Day Spoken English (DCPSE) (Aarts and Wallis, 2006) is a parsed corpus of spoken English made up of two separate datasets. It contains more than 400,000 words from the ICE-GB corpus (collected in the early 1990s) and 400,000 words from the LondonLund Corpus (collected from the late 1960s to the early 1980s)
dialogues between people: 36
By adding these controls, the dataset attempts to focus on solely the dialogue and human speech involved in the planning process. The Walking Around Corpus (Brennan et al, 2013) consists of 36 dialogues between people communicating over mobile telephone. The dialogues have two parts: first, a ‘stationary partner’ is asked to direct a ‘mobile partner’ to find 18 destinations on a medium-sized university campus
It contains 8 long dialogues, totalling about 30 minutes each. Since the persuadees often either disagree or agree strongly with the persuaders points, this would be good corpus for studying social signs of (dis)-agreement between two people. The MAHNOB Mimicry Database (Sun et al, 2011) contains 11 hours of recordings, split over 54 sessions between 60 people engaged either in a socio-political discussion or negotiating a tenancy agreement
Since the persuadees often either disagree or agree strongly with the persuaders points, this would be good corpus for studying social signs of (dis)-agreement between two people. The MAHNOB Mimicry Database (Sun et al, 2011) contains 11 hours of recordings, split over 54 sessions between 60 people engaged either in a socio-political discussion or negotiating a tenancy agreement. This dataset consists of a set of fully synchronized audio-visual recordings of natural dyadic (one-on-one) interactions
The IDIAP Wolf Corpus (Hung and Chittaranjan, 2010) is an audio-visual corpus containing natural conversational data of volunteers who took part in an adversarial role-playing game called ‘Werewolf’. Four groups of 8 to 12 people were recorded using headset microphones and synchronized video cameras, resulting in over 7 hours of conversational data. The novelty of this dataset is that the roles of other players are unknown to game participants, and some of the roles are deceptive in nature
The latter consists of a mix of British and American film scripts, while the former consists of solely American films. The majority of these datasets consist of raw scripts, which are not guaranteed to portray conversations between only two people. The dataset collected by Nio et al (2014), which we refer to as the Filtered Movie Script Corpus, takes over 1 million utterance-response pairs from web-based script resources and filters them down to 86,000 such pairs
utterance-response pairs: 1000000
The majority of these datasets consist of raw scripts, which are not guaranteed to portray conversations between only two people. The dataset collected by Nio et al (2014), which we refer to as the Filtered Movie Script Corpus, takes over 1 million utterance-response pairs from web-based script resources and filters them down to 86,000 such pairs. The filtering method limits the extracted utterances to X-Y-X triples, where X is spoken by one actor and Y by another, and each of the utterances share some semantic similarity
primary movie subtitle datasets: 2
Thus, this dataset could be useful for building dialogue personalization models. There are two primary movie subtitle datasets: the OpenSubtitles (Tiedemann, 2012) and the SubTle Corpus (Ameixa and Coheur, 2013). Both corpora are based on the OpenSubtitles website.9
post-reply pairs: 1300000
The NPS Internet Chatroom Conversations Corpus was one of the first corpora of computer-mediated communication (CMC), and it was intended for various NLP applications such as conversation thread topic detection, author profiling, entity identification, and social network analysis. Several corpora of spontaneous micro-blogging conversations have been collected, such as the Twitter Corpus from Ritter et al (2010), which contains 1.3 million post-reply pairs extracted from Twitter. The corpus was originally constructed to aid in the production of unsupervised approaches to modeling dialogue acts
The Twitter Triples Corpus (Sordoni et al, 2015) is one such example, with a described original dataset of 127 million context-message-response triples, but only a small labeled subset of this corpus has been released. Specifically, the released labeled subset contains 4,232 pairs that scored an average of greater than 4 on the Likert-type scale by crowdsourced evaluators for quality of the response to the contextmessage pair. Similarly, a large micro-blogging dataset, the Sina Weibo Corpus (Shang et al, 2015), which contains 4.5 million post-reply pairs, has been collected and used in literature, but this resource has not yet been made publicly available
post-reply pairs: 4500000
Specifically, the released labeled subset contains 4,232 pairs that scored an average of greater than 4 on the Likert-type scale by crowdsourced evaluators for quality of the response to the contextmessage pair. Similarly, a large micro-blogging dataset, the Sina Weibo Corpus (Shang et al, 2015), which contains 4.5 million post-reply pairs, has been collected and used in literature, but this resource has not yet been made publicly available. We do not include the Sina Weibo Corpus (and its derivatives) in the tables in this section, as they are not primarily in English
10. http://www.reddit.com 11. http://www.twitter.com 12. http://www.usenet.net corpus derived from these posts has been used for research in collaborative filtering (Konstan et al, 1997) and role detection (Fisher et al, 2006). The NUS SMS Corpus (Chen and Kan, 2013) consists of conversations carried out over mobile phone SMS messages between two users. While the original purpose of the dataset was to improve predictive text entry when mobile phones still mapped multiple letters to a single number, aided by video and timing analysis of users entering their messages it could equally be used for analysis of informal dialogue
UseNet forum postings. SMS messages collected between two users, with timing analysis. 1.7B comments across Reddit. Reddit posts from either domestic abuse subreddits, or general chat
The difference between the two corpora is the source: the former is collected from Create Debate forums and the latter from a mix of Wikipedia Discussion pages and LiveJournal postings. The Internet Argument Corpus (IAC) (Walker et al, 2012b) is a forum-based corpus with 390,000 posts on 11,000 discussion topics. Each topic is controversial in nature, including subjects such as evolution, gay marriage and climate change; users participate by sharing their opinions on one of these topics
- C. Khatri A. Venkatesh R. Gabriel Q. Li J. Nunn B. Hedayatnia M. Heng A. Nagar E. King K. Bland A. Wartick Y. Pan H. Song S. Jayadevan G. Hwang A. Pettigrue A. Ram, R. Prasad. Conversational ai: The science behind the alexa prize. In Alexa Prize Proceedings, 2017.
- B. Aarts and S. A. Wallis. The diachronic corpus of present-day spoken english (DCPSE), 2006.
- R. Abbott, B. Ecker, P. Anand, and M. Walker. Internet argument corpus 2.0: An sql schema for dialogic social media and the corpora to go with it. In Language Resources and Evaluation Conference, LREC2016, 2016.
- S. Afantenos, N. Asher, F. Benamara, A. Cadilhac, Cedric Degremont, P. Denis, M. Guhe, S. Keizer, A. Lascarides, O. Lemon, et al. Developing a corpus of strategic conversation in the settlers of catan. In SeineDial 2012-The 16th workshop on the semantics and pragmatics of dialogue, 2012.
- H. Ai, A. Raux, D. Bohus, M. Eskenazi, and D. J. Litman. Comparing spoken dialog corpora collected with recruited subjects versus real users. In Special Interest Group on Discourse and Dialogue (SIGDIAL), 2007. R. Akker and D. Traum. A comparison of addressee detection methods for multiparty conversations. In Workshop on the Semantics and Pragmatics of Dialogue, 2009. Y. Al-Onaizan, U. Germann, U. Hermjakob, K. Knight, P. Koehn, D. M., and K. Yamada. Translating with scarce resources. In AAAI, 2000.
- Springer, 2000. D. Ameixa and L. Coheur. From subtitles to human interactions: introducing the subtle corpus. Technical report, Tech. rep., 2013.
- A. H. Anderson, M. Bader, E. G. Bard, E. Boyle, G. Doherty, S. Garrod, S. Isard, J. Kowtko, J. McAllister, J. Miller, et al. The HCRC map task corpus. Language and speech, 34(4):351–366, 1991.
- J. Andreas, S. Rosenthal, and K. McKeown. Annotating agreement and disagreement in threaded discussion. In LREC, pages 818–822, 2012.
- L. E. Asri, J. He, and K. Suleman. A sequence-to-sequence model for user simulation in spoken dialogue systems. arXiv preprint arXiv:1607.00070, 2016.
- A. J. Aubrey, D. Marshall, P. L. Rosin, J. Vandeventer, D. W. Cunningham, and C. Wallraven. Cardiff conversation database (CCDb): A database of natural dyadic conversations. In Computer Vision and Pattern Recognition Workshops (CVPRW), IEEE Conference on, pages 277–282, 2013.
- H. Aust, M. Oerder, F. Seide, and V. Steinbiss. The philips automatic train timetable information system. Speech Communication, 17(3):249–262, 1995.
- R. E. Banchs. Movie-DiC: a movie dialogue corpus for research and development. In Association for Computational Linguistics: Short Papers, 2012.
- R. E. Banchs and H. Li. IRIS: a chat-oriented dialogue system based on the vector space model. In Association for Computational Linguistics, System Demonstrations, 2012.
- S. Banerjee and A. Lavie. METEOR: An automatic metric for MT evaluation with improved correlation with human judgments. In Association for Computational Linguistics, Workshop on Intrinsic and Extrinsic Evaluation Measures for Machine Translation and/or Summarization, 2005.
- M. Barlow. Corpus of spoken, professional american-english, 2000. J. Beare and B. Scott. The spoken corpus of the survey of english dialects: language variation and oral history. In Proceedings of ALLC/ACH, 1999.
- Y. Bengio, R. Ducharme, P. Vincent, and C. Janvin. A neural probabilistic language model. The Journal of Machine
- Learning Research, 3:1137–1155, 2003.
- Y. Bengio, I. Goodfellow, and A. Courville. Deep learning. An MIT Press book in preparation. Draft chapters available at http://www.iro.umontreal.ca/bengioy/dlbook, 2014. C. Bennett and A. I Rudnicky. The carnegie mellon communicator corpus, 2002. D. Biber and E. Finegan. An initial typology of english text types. Corpus linguistics II: New studies in the analysis and exploitation of computer corpora, pages 19–46, 1986. D. Biber and E. Finegan. Diachronic relations among speech-based and written registers in english. Variation in English:multi-dimensional studies, pages 66–83, 2001. D. Bohus and A. I Rudnicky. Sorry, I didnt catch that! In Recent Trends in Discourse and Dialogue, pages 123–154.
- S. E. Brennan, K. S. Schuhmann, and K. M. Batres. Entrainment on the move and in the lab: The walking around corpus. In Conference of the Cognitive Science Society, 2013.
- G. Brown, A. Anderson, R. Shillcock, and G. Yule. Teaching talk. Cambridge: CUP, 1984.
- Springer, 2000. J. E. Cahn and S. E. Brennan. A psychological model of grounding and repair in dialog. In AAAI Symposium on
- Psychological Models of Communication in Collaborative Systems, 1999. A. Canavan and G. Zipperlen. Callfriend american english-non-southern dialect. Linguistic Data Consortium, 10:1, 1996.
- A. Canavan, D. Graff, and G. Zipperlen. Callhome american english speech. Linguistic Data Consortium, 1997.
- S. K. Card, T. P. Moran, and A. Newell. The Psychology of Human-Computer Interaction. L. Erlbaum Associates Inc., Hillsdale, NJ, USA, 1983. ISBN 0898592437. R. Carter. Orders of reality: Cancode, communication, and culture. ELT journal, 52(1):43–56, 1998. R. Carter and M. McCarthy. Cambridge grammar of English: a comprehensive guide; spoken and written English grammar and usage. Ernst Klett Sprachen, 2006. T. L. Chartrand and J. A. Bargh. The chameleon effect: the perception–behavior link and social interaction. Journal of
- Personality and Social Psychology, 76(6):893, 1999. T. Chen and M. Kan. Creating a live, public short message service corpus: the nus sms corpus. Language Resources and
- Evaluation, 47(2):299–335, 2013. Y.-N. Chen, D. Hakkani-Tur, and X. He. Zero-shot learning of intent embeddings for expansion by convolutional deep structured semantic models. In Acoustics, Speech and Signal Processing (ICASSP), 2016 IEEE International Conference on, pages 6045–6049. IEEE, 2016.
- G. Churcher, E. S. Atwell, and C. Souter. Dialogue management systems: a survey and overview. University of Leeds, School of Computing Research Report 1997.06, 1997. P. R. Cohen. If not turing’s test, then what? AI magazine, 26(4):61, 2005. K. M. Colby. Modeling a paranoid mind. Behavioral and Brain Sciences, 4:515–534, 1981. R. M. Cooper. The control of eye fixation by the meaning of spoken language: A new methodology for the real-time investigation of speech perception, memory, and language processing. Cognitive Psychology, 6(1):84–107, 1974. N. Cristianini and J. Shawe-Taylor. An Introduction to Support Vector Machines: And Other Kernel-based Learning Methods. Cambridge University Press, 2000.
- H. Cuayahuitl, S. Keizer, and O. Lemon. Strategic dialogue management via deep reinforcement learning. arXiv preprint arXiv:1511.08099, 2015. C. Danescu-Niculescu-Mizil and L. Lee. Chameleons in imagined conversations: A new approach to understanding coordination of linguistic style in dialogs. In Association for Computational Linguistics, Workshop on Cognitive Modeling and Computational Linguistics, 2011.
- L. Daubigney, M. Geist, S. Chandramohan, and O. Pietquin. A comprehensive reinforcement learning framework for dialogue management optimization. IEEE Journal of Selected Topics in Signal Processing, 6(8):891–902, 2012.
- M. Davies. Comparing the corpus of american soap operas, COCA, and the BNC, 2012a. M. Davies. Corpus of american soap operas, 2012b. I. de Kok, D. Heylen, and L. Morency. Speaker-adaptive multimodal prediction model for listener responses. In Proceedings of the 15th ACM on International conference on multimodal interaction, 2013. L. Deng and X. Li. Machine learning paradigms for speech recognition: An overview. Audio, Speech, and Language
- Processing, IEEE Transactions on, 21(5):1060–1089, 2013. L. Denoyer and P. Gallinari. The wikipedia xml corpus. In Comparative Evaluation of XML Information Retrieval
- Springer, 2007. B. Dhingra, Z. Zhou, D. Fitzpatrick, M. Muehl, and W. Cohen. Tweet2vec: Character-based distributed representations for social media. arXiv preprint arXiv:1605.03481, 2016.
- Springer, 2012. J. Dodge, A. Gane, X. Zhang, A. Bordes, S. Chopra, A. Miller, A. Szlam, and J. Weston. Evaluating prerequisite qualities for learning end-to-end dialog systems. arXiv preprint arXiv:1511.06931, 2015. S. Dose. Flipping the script: A corpus of american television series (cats) for corpus-based language learning and teaching. Corpus Linguistics and Variation in English: Focus on Non-native Englishes, 2013. E. Douglas-Cowie, R. Cowie, I. Sneddon, C. Cox, O. Lowry, M. Mcrorie, J. Martin, L. Devillers, S. Abrilian, A. Batliner, et al. The humaine database: addressing the collection and annotation of naturalistic and induced emotional data. In Affective computing and intelligent interaction, pages 488–500.
- Springer, 2007. W. Eckert, E. Levin, and R. Pieraccini. User modeling for spoken dialogue system evaluation. In Automatic Speech Recognition and Understanding, 1997. Proceedings., 1997 IEEE Workshop on, pages 80–87, 1997. L. El Asri, H. Schulz, S. Sharma, J. Zumer, J. Harris, E. Fine, R. Mehrotra, and K. Suleman. Frames: Acorpus for adding memory to goal-oriented dialogue systems. preprint on webpage at http://www.maluuba.com/publications/, 2017. M. Elsner and E. Charniak. You talking to me?a corpus and algorithm for conversation disentanglement. In Association for Computational Linguistics (ACL), 2008. M. Elsner and E. Charniak. Disentangling chat. Computational Linguistics, 36(3):389–409, 2010.
- D. Erhan, Y. Bengio, A. Courville, Pierre-A. Manzagol, and P. Vincent. Why does unsupervised pre-training help deep learning? Journal of Machine Learning Research, 11, 2010. G. Di Fabbrizio, G. Tur, and D. Hakkani-Tr. Bootstrapping spoken dialog systems with data reuse. In Special Interest Group on Discourse and Dialogue (SIGDIAL), 2004.
- M. Fatemi, L. E. Asri, H. Schulz, J. He, and K. Suleman. Policy networks with two-stage training for dialogue systems. In Special Interest Group on Discourse and Dialogue (SIGDIAL), 2016. Maryam Fazel-Zarandi, Shang-Wen Li, Jin Cao, Jared Casale, Peter Henderson, David Whitney, and Alborz Geramifard. Learning robust dialog policies in noisy environments. Neural Information Processing Systems (NIPS), Conversational AI Workshop, 2017.
- D. Fisher, M. Smith, and H. T Welser. You are who you talk to: Detecting roles in usenet newsgroups. In Hawaii International Conference on System Sciences (HICSS’06), volume 3, pages 59b–59b, 2006. P. Forchini. Spontaneity reloaded: American face-to-face and movie conversation compared. In Corpus Linguistics, 2009. P. Forchini. Movie language revisited. Evidence from multi-dimensional analysis and corpora. Peter Lang, 2012.
- G. Forgues, J. Pineau, J. Larcheveque, and R. Tremblay. Bootstrapping dialog systems with word embeddings. In Workshop on Modern Machine Learning and Natural Language Processing, Neural Information Processing Systems (NIPS), 2014. E. N. Forsyth and C. H. Martell. Lexical and discourse analysis of online chat dialog. In International Conference on Semantic Computing (ICSC)., pages 19–26, 2007.
- M. Galley, C. Brockett, A. Sordoni, Y. Ji, M. Auli, C. Quirk, M. Mitchell, J. Gao, and B. Dolan. deltaBLEU: A discriminative metric for generation tasks with intrinsically diverse targets. In Association for Computational Linguistics and International Joint Conference on Natural Language Processing of the Asian Federation of Natural Language Processing, pages 445–450, 2015.
- M. Gasic, F. Jurcıcek, S. Keizer, F. Mairesse, B. Thomson, K. Yu, and S. Young. Gaussian processes for fast policy optimisation of pomdp-based dialogue managers. In Special Interest Group on Discourse and Dialogue (SIGDIAL). ACL, 2010.
- M. Gasic, F. Jurcıcek, B. Thomson, K. Yu, and S. Young. On-line policy optimisation of spoken dialogue systems via live interaction with human subjects. In IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU), pages 312–317. IEEE, 2011.
- M. Gasic, M. Henderson, B. Thomson, P. Tsiakoulis, and S. Young. Policy optimisation of pomdp-based dialogue systems without state space compression. In Spoken Language Technology Workshop (SLT), 2012 IEEE, pages 31–36. IEEE, 2012.
- M. Gasic, C. Breslin, M. Henderson, D. Kim, M. Szummer, B. Thomson, P. Tsiakoulis, and S. Young. On-line policy optimisation of Bayesian spoken dialogue systems via human interaction. In IEEE International Conference on Acoustics, Speech and Signal Processing, pages 8367–8371, 2013.
- M. Gasic, N. Mrksic, L. M. Rojas-Barahona, P.-H. Su, S. Ultes, D. Vandyke, T.-H. Wen, and S. Young. Dialogue manager domain adaptation using gaussian process reinforcement learning. Computer Speech & Language, 2016.
- A. Genevay and R. Laroche. Transfer learning for user adaptation in spoken dialogue systems. In International Conference on Autonomous Agents & Multiagent Systems, pages 975–983. International Foundation for Autonomous Agents and Multiagent Systems, 2016.
- K. Georgila, J. Henderson, and O. Lemon. User simulation for spoken dialogue systems: learning and evaluation. In INTERSPEECH, 2006.
- K. Georgila, M. Wolters, J. D. Moore, and R. H. Logie. The MATCH corpus: A corpus of older and younger users interactions with spoken dialogue systems. Language Resources and Evaluation, 44(3):221–261, 2010.
- J. Gibson and A. D. Pick. Perception of another person’s looking behavior. The American journal of psychology, 76(3): 386–394, 1963.
- J. J Godfrey, E. C Holliman, and J McDaniel. SWITCHBOARD: Telephone speech corpus for research and development. In International Conference on Acoustics, Speech, and Signal Processing (ICASSP-92), 1992.
- I. Goodfellow, A. Courville, and Y. Bengio. Deep learning. Book in preparation for MIT Press, 2015. URL http://goodfeli.github.io/dlbook/.
- C. Goodwin. Conversational Organization: Interaction Between Speakers and Hearers. New York: Academic Press, 1981.
- A. L. Gorin, G. Riccardi, and J. H. Wright. How may I help you? Speech Communication, 23(1):113–127, 1997. A. Graves. Sequence transduction with recurrent neural networks. In International Conference on Machine Learning (ICML), Representation Learning Workshop, 2012. S. Greenbaum. Comparing English worldwide: The international corpus of English. Clarendon Press, 1996. S. Greenbaum and G Nelson. The international corpus of english (ICE) project. World Englishes, 15(1):3–15, 1996.
- C. Gulcehre, O. Firat, K. Xu, K. Cho, L. Barrault, H. Lin, F. Bougares, H. Schwenk, and Y. Bengio. On using monolingual corpora in neural machine translation. CoRR, abs/1503.03535, 2015. I. Gurevych and M. Strube. Semantic similarity applied to spoken dialogue summarization. In International Conference on Computational Linguistics (COLING), 2004. V. Haslerud and A. Stenstrom. The bergen corpus of london teenager language (COLT). Spoken English on Computer.
- Transcription, Mark-up and Application. London: Longman, pages 235–242, 1995. P. A. Heeman and J. F. Allen. The TRAINS 93 Dialogues. Technical report, DTIC Document, 1995.
- C. T. Hemphill, J. J. Godfrey, and G. R. Doddington. The ATIS spoken language systems pilot corpus. In DARPA Speech and Natural Language Workshop, pages 96–101, 1990.
- Special Interest Group on Discourse and Dialogue (SIGDIAL), 2013.
- M. Henderson, B. Thomson, and J. Williams. Dialog state tracking challenge 2 & 3, 2014a. M. Henderson, B. Thomson, and J. Williams. The second dialog state tracking challenge. In Special Interest Group on
- Discourse and Dialogue (SIGDIAL), 2014b. M. Henderson, B. Thomson, and S. Young. Word-based dialog state tracking with recurrent neural networks. In Special
- Interest Group on Discourse and Dialogue (SIGDIAL), 2014c. P. Henderson, K. Sinha, N. Angelard-Gontier, N. R. Ke, G. Fried, R. Lowe, and J. Pineau. Ethical challenges in datadriven dialogue systems. arXiv preprint arXiv:1711.09050, 2017.
- G. Hinton, L. Deng, D. Yu, G. E. Dahl, A. Mohamed, N. Jaitly, A. Senior, V. Vanhoucke, P. Nguyen, T.a N. Sainath, et al. Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups. Signal Processing Magazine, IEEE, 29(6):82–97, 2012.
- T. Hiraoka, G. Neubig, K. Yoshino, T. Toda, and S. Nakamura. Active learning for example-based dialog systems. In Proc Intl Workshop on Spoken Dialog Systems, Saariselka, Finland, 2016.
- H. Hung and G. Chittaranjan. The IDIAP wolf corpus: exploring group behaviour in a competitive role-playing game. In International Conference on Multimedia, pages 879–882, 2010.
- J. L. Hutchens and M. D. Alder. Introducing MegaHAL. In Joint Conferences on New Methods in Language Processing and Computational Natural Language Learning, 1998.
- J. L. Part I. Shalyminov X. Xu Y. Yu O. Duek V. Rieser O. Lemon I. Papaioannou, A. C. Curry. Alana: Social dialogue using an ensemble model and a ranker trained on user feedback. In Alexa Prize Proceedings, 2017.
- A. Jonsson and N. Dahlback. Talking to a computer is not like talking to your best friend. In Scandinavian Conference on Artificial Intelligence, 1988.
- S. Jung, C. Lee, K. Kim, M. Jeong, and G. G. Lee. Data-driven user simulation for automated evaluation of spoken dialog systems. Computer Speech & Language, 23(4):479–509, 2009.
- D. Jurafsky and J. H. Martin. Speech and language processing, 2nd Edition. Prentice Hall, 2008.
- F. Jurcıcek, S. Keizer, M. Gasic, F. Mairesse, B. Thomson, K. Yu, and S. Young. Real user evaluation of spoken dialogue systems using amazon mechanical turk. In INTERSPEECH, volume 11, 2011.
- R. Kadlec, M. Schmid, and J. Kleindienst. Improved deep learning baselines for ubuntu corpus dialogs. Neural Information Processing Systems Workshop on Machine Learning for Spoken Language Understanding, 2015.
- S. Kim, L. F. DHaro, R. E. Banchs, J. Williams, and M. Henderson. Dialog state tracking challenge 4, 2015.
- S. Kim, L. F. DHaro, R. E. Banchs, J. D. Williams, M. Henderson, and K. Yoshino. The fifth dialog state tracking challenge. In IEEE Spoken Language Technology Workshop (SLT), 2016.
- V. Konovalov, O. Melamud, R. Artstein, and I. Dagan. Collecting better training data using biased agent policies in negotiation dialogues. In WOCHAT: Workshop on Chatbots and Conversational Agent Technologies, 2016. J. A Konstan, B. N. Miller, D. Maltz, J. L. Herlocker, L. R. Gordon, and J. Riedl. Grouplens: applying collaborative filtering to usenet news. Communications of the ACM, 40(3):77–87, 1997.
- E. Levin, R. Pieraccini, and W. Eckert. Learning dialogue strategies within the markov decision process framework. In Automatic Speech Recognition and Understanding, 1997. Proceedings., 1997 IEEE Workshop on, pages 72–79. IEEE, 1997.
- J. Li, M. Galley, C. Brockett, J. Gao, and B. Dolan. A diversity-promoting objective function for neural conversation models. arXiv preprint arXiv:1510.03055, 2015.
- J. Li, M. Galley, C. Brockett, J. Gao, and Bill D. A persona-based neural conversation model. In Association for Computational Linguistics, pages 994–1003, 2016.
- R. Lowe, N. Pow, I. Serban, and J. Pineau. The ubuntu dialogue corpus: A large dataset for research in unstructured multi-turn dialogue systems. In Special Interest Group on Discourse and Dialogue (SIGDIAL), 2015a.
- R. Lowe, N. Pow, I. V. Serban, L. Charlin, and J. Pineau. Incorporating unstructured textual knowledge sources into neural dialogue systems. Neural Information Processing Systems Workshop on Machine Learning for Spoken Language Understanding, 2015b.
- R. Lowe, I. V. Serban, M. Noseworthy, L. Charlin, and J. Pineau. On the evaluation of dialogue systems with next utterance classification. In Special Interest Group on Discourse and Dialogue (SIGDIAL), 2016.
- J. M. Lucas, F. Fernndez, J. Salazar, J. Ferreiros, and R. San Segundo. Managing speaker identity and user profiles in a spoken dialogue system. In Procesamiento del Lenguaje Natural, number 43 in 1, pages 77–84, 2009.
- B. MacWhinney and C. Snow. The child language data exchange system. Journal of child language, 12(02):271–295, 1985.
- F. Mairesse and S. Young. Stochastic language generation in dialogue using factored language models. Computational Linguistics, 2014.
- F. Mairesse, M. Gasic, F. Jurcıcek, S. Keizer, B. Thomson, K. Yu, and S. Young. Phrase-based statistical language generation using graphical models and active learning. In the Association for Computational Linguistics, pages 1552– 1561. ACL, 2010.
- C. D. Manning and H. Schutze. Foundations of statistical natural language processing. MIT Press, 1999. M. McCarthy. Spoken language and applied linguistics. Ernst Klett Sprachen, 1998. S. McGlashan, N. Fraser, N. Gilbert, E. Bilange, P. Heisterkamp, and N. Youd. Dialogue management for telephone information systems. In Conference on Applied natural Language Processing, pages 245–246. ACL, 1992. G. McKeown, M. F Valstar, R. Cowie, and M. Pantic. The SEMAINE corpus of emotionally coloured character interactions. In Multimedia and Expo (ICME), 2010 IEEE International Conference on, pages 1079–1084, 2010.
- G. Mesnil, X. He, L. Deng, and Y. Bengio. Investigation of recurrent-neural-network architectures and learning methods for spoken language understanding. In INTERSPEECH, pages 3771–3775, 2013.
- T. Mikolov, M. Karafiat, L. Burget, J. Cernocky, and Sanjeev Khudanpur. Recurrent neural network based language model. In INTERSPEECH, pages 1045–1048, 2010.
- T. Mikolov, I. Sutskever, K. Chen, G. S. Corrado, and J. Dean. Distributed representations of words and phrases and their compositionality. In Neural Information Processing Systems, pages 3111–3119, 2013. G. A. Miller. WordNet: a lexical database for english. Communications of the ACM, 38(11):39–41, 1995.
- Association for Computational Linguistics, pages 55–61, 1996.
- K. Mo, S. Li, Y. Zhang, J. Li, and Q. Yang. Personalizing a dialogue system with transfer learning. arXiv preprint arXiv:1610.02891, 2016. S. Mohan and J. Laird. Learning goal-oriented hierarchical tasks from situated interactive instruction. In AAAI, 2014.
- T. Nguyen, M. Rosenberg, X. Song, J. Gao, S. Tiwary, R. Majumder, and L. Deng. Ms marco: A human generated machine reading comprehension dataset. arXiv preprint arXiv:1611.09268, 2016. L Nio, S. Sakti, G. Neubig, T. Toda, and S. Nakamura. Conversation dialog corpora from television and movie scripts. In
- 17th Oriental Chapter of the International Committee for the Co-ordination and Standardization of Speech Databases and Assessment Techniques (COCOSDA), pages 1–4, 2014.
- E. Noth, A. Horndasch, F. Gallwitz, and J. Haas. Experiences with commercial telephone-based dialogue systems. it– Information Technology (vormals it+ ti), 46(6/2004):315–321, 2004.
- C. Oertel, F. Cummins, J. Edlund, P. Wagner, and N. Campbell. D64: A corpus of richly recorded conversational interaction. Journal on Multimodal User Interfaces, 7(1-2):19–28, 2013. A. H. Oh and A. I. Rudnicky. Stochastic language generation for spoken dialogue systems. In ANLP/NAACL Workshop on Conversational systems, volume 3, pages 27–32. ACL, 2000. T. Paek. Reinforcement learning for spoken dialogue systems: Comparing strengths and weaknesses for practical deployment. In Proc. Dialog-on-Dialog Workshop, INTERSPEECH, 2006.
- K. Papineni, S. Roukos, T Ward, and W Zhu. BLEU: a method for automatic evaluation of machine translation. In Association for Computational Linguistics, 2002. G. Parent and M. Eskenazi. Toward better crowdsourced transcription: Transcription of a year of the let’s go bus information system data. In Spoken Language Technology Workshop (SLT), 2010 IEEE, pages 312–317. IEEE, 2010.
- A. N. Pargellis, H-K. J. Kuo, and C. Lee. An automatic dialogue generation platform for personalized dialogue applications. Speech Communication, 42(3-4):329–351, 2004. doi: 10.1016/j.specom.2003.10.003. R. Passonneau and E. Sachar. Loqui human-human dialogue corpus (transcriptions and annotations), 2014.
- B. Peng, X. Li, L. Li, J. Gao, A. Celikyilmaz, S. Lee, and K.-F. Wong. Composite Task-Completion Dialogue Policy Learning via Hierarchical Deep Reinforcement Learning. In Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 2221–2230, 2017.
- D. Perez-Marin and I. Pascual-Nieto. Conversational Agents and Natural Language Interaction: Techniques and Effective Practices. IGI Global, 2011.
- S. Petrik. Wizard of Oz Experiments on Speech Dialogue Systems. PhD thesis, Technischen Universitat Graz, 2004.
- R. Pieraccini, D. Suendermann, K. Dayanidhi, and J. Liscombe. Are we there yet? research in commercial spoken dialog systems. In Text, Speech and Dialogue, pages 3–13, 2009. O. Pietquin and H. Hastie. A survey on metrics for the evaluation of user simulations. The knowledge engineering review, 28(01):59–73, 2013.
- O. Pietquin, M. Geist, S. Chandramohan, and H. Frezza-Buet. Sample-efficient batch reinforcement learning for dialogue management optimization. ACM Transactions on Speech and Language Processing (TSLP), 7(3):7, 2011.
- Machine Learning for Interactive Systems (MLIS 2015), volume 43, 2015. C. Potts. Goal-driven answers in the cards dialogue corpus. In West Coast Conference on Formal Linguistics, pages 1–20, 2012. A. Ratnaparkhi. Trainable approaches to surface natural language generation and their application to conversational dialog systems. Computer Speech & Language, 16(3):435–455, 2002.
- A. Raux, B. Langner, D. Bohus, A. W. Black, and M. Eskenazi. Lets go public! Taking a spoken dialog system to the real world. In INTERSPEECH, 2005. N. Reithinger and M. Klesen. Dialogue act classification using language models. In EuroSpeech, 1997.
- H. Ren, W. Xu, Y. Zhang, and Y. Yan. Dialog state tracking using conditional random fields. In Special Interest Group on Discourse and Dialogue (SIGDIAL), 2013.
- Workshop on Automatic Speech Recognition & Understanding (ASRU), 2007. R. Reppen and N. Ide. The american national corpus overall goals and the first release. Journal of English Linguistics, 32(2):105–113, 2004. J. Rickel and W. L. Johnson. Animated agents for procedural training in virtual reality: Perception, cognition, and motor control. Applied artificial intelligence, 13(4-5):343–382, 1999. V. Rieser and O. Lemon. Natural language generation as planning under uncertainty for spoken dialogue systems. In
- Springer, 2010. Verena Rieser and Oliver Lemon. Reinforcement learning for adaptive dialogue systems: a data-driven methodology for dialogue management and natural language generation. Springer Science & Business Media, 2011.
- Association for Computational Linguistics (NAACL 2010), 2010.
- Methods in Natural Language Processing (EMNLP), 2011. S. Rosenthal and K. McKeown. I couldnt agree more: The role of conversational structure in agreement and disagreement detection in online discussions. In Special Interest Group on Discourse and Dialogue (SIGDIAL), 2015. S. Rosset and S. Petel. The ritel corpus-an annotated human-machine open-domain question answering spoken dialog corpus. In The International Conference on Language Resources and Evaluation (LREC), 2006.
- International Conference on Language Resources and Evaluation (LREC), volume 2, 2014.
- Practice. International Computer Science Institute, 2006. Distributed with the FrameNet data. J. Schatzmann and S. Young. The hidden agenda user simulation model. IEEE transactions on audio, speech, and language processing, 17(4):733–747, 2009.
- J. Schatzmann, K. Georgila, and S. Young. Quantitative evaluation of user simulation techniques for spoken dialogue systems. In Special Interest Group on Discourse and Dialogue (SIGDIAL), 2005.
- J. Schatzmann, B. Thomson, K. Weilhammer,. Ye, and S. Young. Agenda-based user simulation for bootstrapping a pomdp dialogue system. In Human Language Technologies 2007: The Conference of the North American Chapter of the Association for Computational Linguistics; Companion Volume, Short Papers, pages 149–152, 2007. K. Scheffler and S. Young. Automatic learning of dialogue strategy using dialogue simulation and reinforcement learning. In International Conference on Human Language Technology Research, pages 12–19. Morgan Kaufmann Publishers Inc., 2002. J. N. Schrading. Analyzing domestic abuse using natural language processing on social media data. Master’s thesis, Rochester Institute of Technology, 2015. http://scholarworks.rit.edu/theses.
- N. Schrading, C. O. Alm, R. Ptucha, and C. M. Homan. An analysis of domestic a.se discourse on reddit. In Conference on Empirical Methods in Natural Language Processing (EMNLP), 2015.
- K. K. Schuler. VerbNet: A broad-coverage, comprehensive verb lexicon. PhD thesis, University of Pennsylvania, 2005. Paper AAI3179808.
- I. V. Serban and J. Pineau. Text-based speaker identification for multi-participant open-domain dialogue systems. Neural Information Processing Systems Workshop on Machine Learning for Spoken Language Understanding, 2015.
- I. V. Serban, R. Lowe, L. Charlin, and J. Pineau. Generative deep neural networks for dialogue: A short review. In Neural Information Processing Systems (NIPS), Let’s Discuss: Learning Methods for Dialogue Workshop, 2016.
- I. V. Serban, A. Sordoni, Y. Bengio, A. Courville, and J. Pineau. Building End-To-End Dialogue Systems Using Generative Hierarchical Neural Networks. In AAAI, 2016. In press.
- I. V. Serban, T. Klinger, G. Tesauro, K. Talamadupula, B. Zhou, Y. Bengio, and A. Courville. Multiresolution recurrent neural networks: An application to dialogue response generation. In AAAI Conference, 2017a.
- I. V. Serban, C. Sankar, M. Germain, S. Zhang, Z. Lin, S. Subramanian, T. Kim, M. Pieper, S. Chandar, N. R. Ke, et al. A Deep Reinforcement Learning Chatbot. arXiv preprint arXiv:1709.02349, 2017b.
- I. V Serban, C. Sankar, S. Zhang, Z. Lin, S. Subramanian, T. Kim, S. Chandar, N. R. Ke, et al. The octopus approach to the alexa competition: A deep ensemble-based socialbot. In Alexa Prize Proceedings, 2017c.
- I. V. Serban, A. Sordoni, R. Lowe, L. Charlin, J. Pineau, A. Courville, and Y. Bengio. A hierarchical latent variable encoder-decoder model for generating dialogues. In AAAI Conference, 2017d.
- S. Shaikh, T. Strzalkowski, G. A. Broadwell, J. Stromer-Galley, S. M. Taylor, and N. Webb. Mpc: A multi-party chat corpus for modeling social phenomena in discourse. In The International Conference on Language Resources and Evaluation (LREC), 2010.
- L. Shang, Z. Lu, and H. Li. Neural responding machine for short-text conversation. arXiv preprint arXiv:1503.02364, 2015.
- C. Shaoul and C. Westbury. A Usenet Corpus (2005-2009), 2009.
- S. Sharma, J. He, K. Suleman, H. Schulz, and P. Bachman. Natural language generation in dialogue using lexicalized and delexicalized data. arXiv preprint arXiv:1606.03632, 2016. B. A. Shawar and E. Atwell. Different measurements metrics to evaluate a chatbot system. In Workshop on Bridging the
- Gap: Academic and Industrial Research in Dialog Technologies, pages 89–96, 2007a. B. A. Shawar and Eric Atwell. Chatbots: are they really useful? In LDV Forum, volume 22, pages 29–49, 2007b. E. Shriberg, R. Dhillon, S. Bhagat, J. Ang, and H. Carvey. The ICSI meeting recorder dialog act (mrda) corpus. Technical report, DTIC Document, 2004. A. Simpson and N. M Eraser. Black box and glass box evaluation of the SUNDIAL system. In European Conference on
- Speech Communication and Technology, 1993.
- S. Singh, D. Litman, M. Kearns, and M. Walker. Optimizing dialogue management with reinforcement learning: Experiments with the njfun system. Journal of Artificial Intelligence Research, pages 105–133, 2002.
- Neural Information Processing Systems, 1999.
- A. Sordoni, M. Galley, M. Auli, C. Brockett, Y. Ji, M. Mitchell, J. Nie, J. Gao, and B. Dolan. A neural network approach to context-sensitive generation of conversational responses. In Conference of the North American Chapter of the Association for Computational Linguistics (NAACL-HLT 2015), 2015.
- J. Benjamins, 2002. A. Stent, R. Prasad, and M. Walker. Trainable sentence planning for complex information presentation in spoken dialog systems. In Association for Computational Linguistics, page 79. ACL, 2004.
- A. Stolcke, K. Ries, N. Coccaro, E. Shriberg, R. Bates, D. Jurafsky, P. Taylor, R. Martin, C. Van Ess-Dykema, and M. Meteer. Dialogue act modeling for automatic tagging and recognition of conversational speech. Computational linguistics, 26(3):339–373, 2000. P.-H. Su, Y.-B. Wang, T.-H. Yu, and L.-S. Lee. A dialogue game framework with personalized training using reinforcement learning for computer-assisted language learning. In 2013 IEEE International Conference on Acoustics, Speech and Signal Processing, pages 8213–8217. IEEE, 2013. P.-H. Su, D. Vandyke, M. Gasic, D. Kim, N. Mrksic, T.-H. Wen, and S. Young. Learning from real users: Rating dialogue success with neural networks for reinforcement learning in spoken dialogue systems. In INTERSPEECH, 2015. P.-H. Su, M. Gasic, N. Mrksic, L. Rojas-Barahona, S. Ultes, D. Vandyke, T.-H. Wen, and S. Young. Continuously learning neural dialogue management. arXiv preprint arXiv:1606.02689, 2016.
- J. Svartvik. The London-Lund corpus of spoken English: Description and research. Number 82 in 1. Lund University Press, 1990.
- B. Thomson and S. Young. Bayesian update of dialogue state: A POMDP framework for spoken dialogue systems. Computer Speech & Language, 24(4):562–588, 2010.
- J. Tiedemann. Parallel data, tools and interfaces in opus. In The International Conference on Language Resources and Evaluation (LREC), 2012.
- D. Traum and J. Rickel. Embodied agents for multi-party dialogue in immersive virtual worlds. In International Joint Conference on Autonomous Agents and Multiagent Systems: Part 2, pages 766–773. ACM, 2002.
- Wiley, January 2011.
- A. M. Turing. Computing machinery and intelligence. Mind, pages 433–460, 1950. D. C Uthus and D. W Aha. The ubuntu chat corpus for multiparticipant chat analysis. In AAAI Spring Symposium: Analyzing Microtext, 2013.
- J. Vandeventer, A. J. Aubrey, P. L. Rosin, and D. Marshall. 4d cardiff conversation database (4D CCDb): A 4D database of natural, dyadic conversations. In Joint Conference on Facial Analysis, Animation and Auditory-Visual Speech Processing (FAAVSP 2015), 2015.
- D. Vandyke, P.-H. Su, M. Gasic, N. Mrksic, T.-H. Wen, and S. Young. Multi-domain dialogue success classifiers for policy training. In Automatic Speech Recognition and Understanding (ASRU), 2015 IEEE Workshop on, pages 763– 770. IEEE, 2015. O. Vinyals and Q. Le. A neural conversational model. arXiv preprint arXiv:1506.05869, 2015.
- M. A. Walker, D. J. Litman, C. A. Kamm, and A. Abella. Paradise: A framework for evaluating spoken dialogue agents. In European Chapter of the Association for Computational Linguistics (EACL), pages 271–280, 1997.
- M. A. Walker, O. C. Rambow, and M. Rogati. Training a sentence planner for spoken dialogue using boosting. Computer Speech & Language, 16(3):409–433, 2002.
- J. Williams, A. Raux, D. Ramachandran, and A. Black. The dialog state tracking challenge. In Special Interest Group on Discourse and Dialogue (SIGDIAL), 2013.
- J. D. Williams and S. Young. Partially observable markov decision processes for spoken dialog systems. Computer Speech & Language, 21(2):393–422, 2007.
- J. D. Williams and G. Zweig. End-to-end lstm-based dialog control optimized with supervised and reinforcement learning. arXiv preprint arXiv:1606.01269, 2016.
- M. Wolska, Q. B. Vo, D. Tsovaltzi, I. Kruijff-Korbayova, E. Karagjosova, H. Horacek, A. Fiedler, and C. Benzmuller. An annotated corpus of tutorial dialogs on mathematical theorem proving. In The International Conference on Language Resources and Evaluation (LREC), 2004.
- B. Wrede and E. Shriberg. Relationship between dialogue acts and hot spots in meetings. In Automatic Speech Recognition and Understanding, 2003. ASRU’03. 2003 IEEE Workshop on, pages 180–185. IEEE, 2003.
- X. Yang, Y.-N. Chen, D. Hakkani-Tur, P. Crook, X. Li, J. Gao, and L. Deng. End-to-end joint learning of natural language understanding and dialogue manager. In Acoustics, Speech and Signal Processing (ICASSP), 2017 IEEE International Conference on, pages 5690–5694. IEEE, 2017.
- Y. Yang, W. Yih, and C. Meek. Wikiqa: A challenge dataset for open-domain question answering. In Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 2013–2018, 2015.
- Z. Yang, B. Li, Y. Zhu, I. King, G. Levow, and H. Meng. Collection of user judgments on spoken dialog system with crowdsourcing. In Spoken Language Technology Workshop (SLT), 2010 IEEE, pages 277–282, 2010.
- S. Young, M. Gasic, B. Thomson, and J. D. Williams. POMDP-based statistical spoken dialog systems: A review. IEEE, 101(5):1160–1179, 2013.
- S. J. Young. Probabilistic methods in spoken–dialogue systems. Philosophical Transactions of the Royal Society of London. Series A: Mathematical, Physical and Engineering Sciences, 358(1769), 2000.
- J. Zhang, R. Kumar, S. Ravi, and C. Danescu-Niculescu-Mizil. Conversational flow in oxford-style debates. In Conference of the North American Chapter of the Association for Computational Linguistics (NAACL-HLT 2016), 2016.