AI帮你理解科学

AI 生成解读视频

AI抽取解析论文重点内容自动生成视频


pub
生成解读视频

AI 溯源

AI解析本论文相关学术脉络


Master Reading Tree
生成 溯源树

AI 精读

AI抽取本论文的概要总结


微博一下
We find that larger gains over conventional rule-based baselines are present in dialog systems where the speech recognition confidence score has poor discrimination

The Dialog State Tracking Challenge Series: A Review.

D&D, no. 3 (2016): 4-33

被引用129|浏览269
EI
下载 PDF 全文
引用
微博一下

摘要

In a spoken dialog system, dialog state tracking deduces information about the user’s goal as the dialog progresses, synthesizing evidence such as dialog acts over multiple turns with external data sources. Recent approaches have been shown to overcome ASR and SLU errors in some applications. However, there are currently no common testbed...更多

代码

数据

0
简介
  • H recognition (ASR) and spoken language understanding (SLU) errors are common, and can cause the system to misunderstand the user’s needs.
  • Most commercial systems use hand-crafted heuristics for state tracking, selecting the SLU result with the highest confidence score, and discarding alternatives.
  • Statistical approaches compute scores for many hypotheses for the dialog state (Figure 1).
  • By exploiting correlations between turns and information from external data sources – such as maps, bus timetables, or models of past dialogs – statistical approaches can overcome some SLU errors
重点内容
  • h recognition (ASR) and spoken language understanding (SLU) errors are common, and can cause the system to misunderstand the user’s needs
  • Teams were asked to process the test dialogs online – i.e., to make a single pass over the data, as if the tracker were being run in deployment
  • The data, evaluation tools, and baselines will continue to be freely available to the research community (DST, 2013)
  • The results of the challenge show that the suite of performance metrics cluster into 4 natural groups
  • We find that larger gains over conventional rule-based baselines are present in dialog systems where the speech recognition confidence score has poor discrimination
  • We observe substantial limitations on generalization: in mismatched conditions, around half of the trackers entered did not exceed the performance of two simple baselines
结果
  • Results and discussion

    Logistically, the training data and labels, bus timetable database, scoring scripts, and baseline system were publicly released in late December 2012.
  • The test data was released on 22 March 2013, and teams were given a week to run their trackers and send results back to the organizers for evaluation.
  • 6. Here the authors see 4 natural clusters emerge: a cluster for correctness with Accuracy, MRR, and the ROC.V1.CA measures; a cluster for probability quality with L2 and Average score; and two clusters for score discrimination – one with ROC.V1.EER and the other with the three ROC.V2 metrics.
  • Results in Figure 4 emphasize that different trackers are tuned for different performance measures, and the optimal tracking algorithm depends crucially on the target performance measure
结论
  • The dialog state tracking challenge has provided the first common testbed for this task.
  • The details of the trackers themselves will be published at SIGDIAL 2013.
  • The results of the challenge show that the suite of performance metrics cluster into 4 natural groups.
  • The authors find that larger gains over conventional rule-based baselines are present in dialog systems where the speech recognition confidence score has poor discrimination.
  • The authors observe substantial limitations on generalization: in mismatched conditions, around half of the trackers entered did not exceed the performance of two simple baselines
表格
  • Table1: Summary of the datasets. One turn includes a system output and a user response. Slots are named entity types such as bus route, origin neighborhood, date, time, etc. N-best SLU Recall indicates the fraction of concepts which appear anywhere on the SLU N-best list
Download tables as Excel
基金
  • The organizers also thank Ian Lane for his support for transcription, and Microsoft and Honda Research Institute USA for funding the challenge
研究对象与分析
workers: 3
When a transcription exactly and unambiguously matched a recognized slot value, such as the bus route “sixty one c”, labels were assigned automatically. The remainder were assigned using crowdsourcing, where three workers were shown the true words spoken and the recognized concept, and asked to indicate if the recognized concept was correct – even if it did not match the recognized words exactly. Workers were also shown dialog history, which helps decipher the user’s meaning when their speech was ambiguous

workers: 3
Workers were also shown dialog history, which helps decipher the user’s meaning when their speech was ambiguous. If the 3 workers were not unanimous in their labels (about 4% of all turns), the item was labeled manually by the organizers. The REST meta-hypothesis was not explicitly labeled; rather, it was deemed to be correct if none of the prior SLU results were labeled as correct

引用论文
  • AW Black, S Burger, B Langner, G Parent, and M Eskenazi. 2010. Spoken dialog challenge 2010. In Proc SLT, Berkeley.
    Google ScholarLocate open access versionFindings
  • D Bohus and AI Rudnicky. 2006. A ‘K hypotheses + other’ belief updating model. In Proc AAAI Workshop on Statistical and Empirical Approaches for Spoken Dialogue Systems, Boston.
    Google ScholarLocate open access versionFindings
  • 201Dialog State Tracking Challenge Homepage. http://research.microsoft.com/events/dstc/.
    Findings
  • H Higashinaka, M Nakano, and K Aikawa. 2003. Corpus-based discourse understanding in spoken dialogue systems. In Proc ACL, Sapporo.
    Google ScholarLocate open access versionFindings
  • D Huggins-Daines, M Kumar, A Chan, A W Black, M Ravishankar, and A I Rudnicky. 2006. PocketSphinx: A Free, Real-Time Continuous Speech Recognition System for Hand-Held Devices. In Proc ICASSP, Toulouse.
    Google ScholarLocate open access versionFindings
  • M Kendall. 1938. A new measure of rank correlation. Biometrika, 30(1-2):81–89.
    Google ScholarLocate open access versionFindings
  • Y Ma, A Raux, D Ramachandran, and R Gupta. 2012. Landmark-based location belief tracking in a spoken dialog system. In Proc SigDial, Seoul.
    Google ScholarLocate open access versionFindings
  • N Mehta, R Gupta, A Raux, D Ramachandran, and S Krawczyk. 2010. Probabilistic ontology trees for belief tracking in dialog systems. In Proc SigDial, Tokyo.
    Google ScholarLocate open access versionFindings
  • T Paek and E Horvitz. 2000. Conversation as action under uncertainty. In Proc UAI, Stanford, pages 455–464.
    Google ScholarLocate open access versionFindings
  • G Parent and M Eskenazi. 20Toward Better Crowdsourced Transcription: Transcription of a Year of the Let’s Go Bus Information System Data. In Proc SLT, Berkeley.
    Google ScholarLocate open access versionFindings
  • B Thomson and SJ Young. 2010. Bayesian update of dialogue state: A POMDP framework for spoken dialogue systems. Computer Speech and Language, 24(4):562–588.
    Google ScholarLocate open access versionFindings
  • JD Williams and SJ Young. 2007. Partially observable Markov decision processes for spoken dialog systems. Computer Speech and Language, 21(2):393– 422.
    Google ScholarLocate open access versionFindings
  • JD Williams, A Raux, D Ramachandran, and AW Black. 2012. Dialog state tracking challenge handbook. Technical report, Microsoft Research.
    Google ScholarFindings
  • JD Williams. 2010. Incremental partition recombination for efficient tracking of multiple dialogue states. In Proc. of ICASSP.
    Google ScholarLocate open access versionFindings
  • SJ Young, M Gasic, S Keizer, F Mairesse, J Schatzmann, B Thomson, and K Yu. 2010. The hidden information state model: a practical framework for POMDP-based spoken dialogue management. Computer Speech and Language, 24(2):150–174.
    Google ScholarLocate open access versionFindings
作者
Matthew Henderson
Matthew Henderson
您的评分 :
0

 

标签
评论
数据免责声明
页面数据均来自互联网公开来源、合作出版商和通过AI技术自动分析结果,我们不对页面数据的有效性、准确性、正确性、可靠性、完整性和及时性做出任何承诺和保证。若有疑问,可以通过电子邮件方式联系我们:report@aminer.cn
小科