Interspeech 2007 Session FrC.O1: Spoken dialogue systems II
Type
oral
Date
Friday, August 31, 2007
Time
13:30 – 15:30
Room
Elisabeth
Chair
David Traum (USC)
FrC.O1‑1
13:30
Automated Directory Assistance System - from Theory to Practice
Dong Yu, Microsoft Research
Yun-Cheng Ju, Microsoft Research
Ye-Yi Wang, Microsoft Research
Geoffrey Zweig, Microsoft Research
Alex Acero, Microsoft Research
The automated directory assistance system (ADAS) is traditionally formulated as an automatic speech recognition (ASR) problem. Recently, it has been formulated as a voice search problem, where a spoken utterance is firstly converted into text, which in turn is used to search for the listing. In this paper, we focus on the design and development of the utterance-to-listing component of ADAS. We show that many theoretical and practical issues need to be resolved when applying the basic idea of voice search to the development of ADAS. We share our experiences in addressing these issues, especially in pre-processing the listing database, generating a high performance LM, and developing efficient, accurate, and robust search algorithms. Field tests of our prototype system indicate that an 81% task completion rate can be achieved.
FrC.O1‑2
13:50
The Voice-Rate Dialog System for Consumer Ratings
Geoffrey Zweig, Microsoft
Patrick Nguyen, Microsoft
Y. C. Ju, Microsoft
Ye-Yi Wang, Microsoft
Dong Yu, Microsoft
Alex Acero, Microsoft
Voice-Rate is an experimental dialog system that makes product and business ratings available to consumers via a toll-free phone number. By calling Voice-Rate, users can access the ratings of more than one million products, a quarter million local businesses (restaurants), and three thousand national businesses. This paper describes the Voice Rate system, and solutions to three key technical challenges: robust name-matching, efficient disambiguation, and review synthesis for telephone playback. Voice-Rate can be accessed by calling 1-877-456-DATA (toll-free) within the U.S.
FrC.O1‑3
14:10
The Influence of User Tailoring and Cognitive Load on User Performance in Spoken Dialogue Systems
Andi Winterboer, University of Edinburgh
Jiang Hu, Stanford University
Johanna Moore, University of Edinburgh
Clifford Nass, Stanford University
This paper presents results of a Wizard-of-Oz (WoZ) experiment carried out to examine the effect of two different information presentation methods on a secondary task, namely driving. The results not only demonstrate that the user-modeled summarize and refine (UMSR) approach enables more efficient information retrieval in comparison to the summarize and refine (SR) approach, but also did not negatively affect the driving-task performance.
FrC.O1‑4
14:30
Confidence Measures for Voice Search Applications
Ye-Yi Wang, Microsoft Research
Dong Yu, Microsoft Research
Yun-Cheng Ju, Microsoft Research
Geoffrey Zweig, Microsoft Research
Alex Acero, Microsoft Research
Voice search is the technology underlying many spoken dialog applications that enable users to access information using spoken queries. This paper reviews voice search technology, and proposes a new and effective method for computing semantic confidence measures. It explores the use of maximum entropy classifiers as confidence models, and investigates a feature selection algorithm that leads to an effective subset of prominent features for the classifier. The experimental results on a directory assistance application show that the reduced feature set not only makes the model more effective in handling different recognition and search engine combinations, but also results in a very informative confidence measure that is closely correlated with the actual voice search accuracy.
FrC.O1‑5
14:50
Effects of Quiz-style Information Presentation on User Understanding
Ryuichiro Higashinaka, NTT Communication Science Laboratories
Kohji Dohsaka, NTT Communication Science Laboratories
Shigeaki Amano, NTT Communication Science Laboratories
Hideki Isozaki, NTT Communication Science Laboratories
This paper proposes quiz-style information presentation for interactive systems as a means to improve user understanding in educational tasks. Since the nature of quizzes can highly motivate users to stay voluntarily engaged in the interaction and keep their attention on receiving information, it is expected that information presented as quizzes can be better understood by users. To verify the effectiveness of the approach, we implemented read-out and quiz systems and performed comparison experiments using human subjects. In the task of memorizing biographical facts, the results showed that user understanding for the quiz system was significantly better than that for the read-out system, and that the subjects were more willing to use the quiz system despite the long duration of the quizzes. This indicates that quiz-style information presentation promotes engagement in the interaction with the system, leading to the improved user understanding.
FrC.O1‑6
15:10
A Data Visualization and Analysis Method for Natural Language Call Routing System Design
Hong-Kwang Jeff Kuo, IBM T.J. Watson Research Center
Vaibhava Goel, IBM T.J. Watson Research Center
We describe a data visualization tool that allows a natural language call routing system designer to browse the data from high level routing target classes down to individual sentences. For each target class, automatic clustering creates groups that cluster similar requests. Relabeling data is much more efficient because a cluster of many sentences, instead of individual sentences, can be relabeled in one action. The tool also detects and displays potential confusions between sub-clusters across different classes. The confusability may be caused by erroneous labeling, in which case the entire sub-cluster can be relabeled. If the confusability is inherent, the system designer can design a disambiguation dialogue to clarify the caller's intent.