Paper Search Console

Home Search Page Alphabetical List About Contact

Journal Title

Title of Journal:

Search In Journal Title:


Search In Journal Abbravation:


Springer, Cham

Search In Publisher:



Search In DOI:


Search In ISSN:
Search In Title Of Papers:

Evaluation of Statistical POMDP-Based Dialogue Systems in Noisy Environments

Authors: Steve Young, Catherine Breslin, Milica Gašić, Matthew Henderson, Dongho Kim, Martin Szummer, Blaise Thomson, Pirros Tsiakoulis, Eli Tzirkel Hancock,

Publish Date: 2016
Volume: , Issue:, Pages: 3-14
PDF Link


Compared to conventional hand-crafted rule-based dialogue management systems, statistical POMDP-based dialogue managers offer the promise of increased robustness, reduced development and maintenance costs, and scaleability to large open-domains. As a consequence, there has been considerable research activity in approaches to statistical spoken dialogue systems over recent years. However, building and deploying a real-time spoken dialogue system is expensive, and even when operational, it is hard to recruit sufficient users to get statistically significant results. Instead, researchers have tended to evaluate using user simulators or by reprocessing existing corpora, both of which are unconvincing predictors of actual real world performance. This paper describes the deployment of a real-world restaurant information system and its evaluation in a motor car using subjects recruited locally and by remote users recruited using Amazon Mechanical Turk. The paper explores three key questions: are statistical dialogue systems more robust than conventional hand-crafted systems; how does the performance of a system evaluated on a user simulator compare to performance with real users; and can performance of a system tested over the telephone network be used to predict performance in more hostile environments such as a motor car? The results show that the statistical approach is indeed more robust, but results from a simulator significantly over-estimate performance both absolute and relative. Finally, by matching WER rates, performance results obtained over the telephone can provide useful predictors of performance in noisier environments such as the motor car, but again they tend to over-estimate performance.



Search In Abstract Of Papers:
Other Papers In This Journal:

Search Result:

Help video to use 'Paper Search Console'