Interspeech 2007 Session TuC.P1b: Spoken dialogue systems I
Tuesday, August 28, 2007
13:30 – 15:30
Yoshinori Sagisaka (Waseda univ.)
Utilizing online content as domain knowledge in a multi-domain dynamic dialogue system
Craig Wootton, University of Ulster
Michael McTear, University of Ulster
Terry Anderson, University of Ulster
Recent research in dialogue systems has investigated the feasibility of relying on information extracted from the Internet as a source of content and domain knowledge. However, this information needs to be processed and prepared into a form understandable by the dialogue manager. The number of domains and web sites are often restricted to a finite number, with prior knowledge of the site structure itself usually required by the dialogue manager. We present an architecture which demonstrates that multi-domain dialogue, relying on information extracted from online sources, is possible without the need for human intervention or knowledge of the site structure itself.
Handling speech input in the Ritel QA dialogue system
Boris Van Schooten, Human Media Interaction, University of Twente, Netherlands
Sophie Rosset, Spoken Language Processing Group, LIMSI-CNRS, France
Olivier Galibert, Spoken Language Processing Group, LIMSI-CNRS, France
Aurélien Max, Language, Information and Representations, LIMSI-CNRS and Université Paris-Sud 11, France
Rieks Op den Akker, Human Media Interaction, University of Twente, Netherlands
Gabriel Illouz, Language, Information and Representations, LIMSI-CNRS and Université Paris-Sud 11, France
The Ritel system aims to provide open-domain question answering to casual users by means of a telephone dialogue. Providing a sufficiently natural speech dialogue in a QA system has some unique challenges, such as very fast overall performance and large-vocabulary speech recognition. Speech QA is an error-prone process, but errors or problems that occur may be resolved with help of user feedback, as part of a natural dialogue. This paper reports on the latest version of Ritel, which includes search term confirmation and improved follow-up question handling, as well as various improvements on the other system components. We collected a small dialogue corpus with the new system, which we use to evaluate and discuss it.
Online Call Quality Monitoring for Automating Agent-Based Call Centers
Woosung Kim, Convergys Innovation Center
One of the challenges in automating a call center is the tradeoff between customer satisfaction and the cost of human agents: i.e., most callers prefer human agents to automated systems, but adding human agents substantially increases call center operating costs. One possible compromise is to let callers use automation at the beginning of the call and bring in a human agent if they have problems. The key problem here is, obviously, how to detect the problematic calls promptly before it is too late. This paper proposes a novel method for monitoring call quality, aiming to salvage callers having problems with automation by bringing in a human agent in a timely manner. We propose to use finite state machines to automatically label call data for training and use the log likelihood ratio for monitoring calls to detect bad calls. We demonstrate, by experiments, that it is possible to detect bad calls before callers give up the call, which increases customer satisfaction and minimizes costs.
Analysis of Communication Failures for Spoken Dialogue Systems
Sebastian Möller, Deutsche Telekom Labs, TU Berlin
Klaus-Peter Engelbrecht, Deutsche Telekom Labs, TU Berlin
Antti Oulasvirta, Helsinki Institute for Information Technology, Helsinki University of Technology
Communication failures are typical for interactions with spoken dialogue systems, in particular when dialogues get less structured and less foreseeable. In this paper, we adopt a new classification scheme of communication failures and their consequences and show its usefulness in three respects: (1) For the systematic analysis of data collected in user testing, (2) for the prediction of user-perceived quality and usability, and (3) for the automatic testing of usability in a simulation testbed. Experimental results are presented for two spoken dialogue systems which differ in their dialogue structure and complexity. They show that the failure classification may uncover the causes of interaction problems between user and system, irrespective of system complexity, and that failure consequences can serve as a predictor of user satisfaction.
How to Access Audio Files of Large Data Bases Using In-car Speech Dialogue Systems
Sandra Mann, DaimlerChrysler AG, Group Research and Advanced Engineering, Ulm, Germany
André Berton, DaimlerChrysler AG, Group Research and Advanced Engineering, Ulm, Germany
Ute Ehrlich, DaimlerChrysler AG, Group Research and Advanced Engineering, Ulm, Germany
Today, a number of in-car speech interfaces to handle large vocabulary are available. We propose an approach that allows accessing audio data on different media carriers and in various formats in a uniform way. This uniformity is achieved by providing an audio data retrieval via metadata. Each audio file is enhanced with machine readable information about several categories (e.g. title, artist, genre etc.). Searching for particular audio data the user may pre-select one of these categories, thus restricting the search area. The categories are the same in the metadata of all connected media carriers. The user may directly address the contents of the categories by means of speakable text entries (text enrolments), irrespective of the media carrier or format. Alternatively the user may search globally across all categories by speaking the complete name of a title, album, artist, genre or year – without having to navigate through complex hierarchies and long result lists.
Analyzing Temporal Transition of Real User's Behaviors in a Spoken Dialogue System
Kazunori Komatani, Kyoto University
Tatsuya Kawahara, Kyoto University
Hiroshi G. Okuno, Kyoto University
Managing various behaviors of real users is indispensable for spoken dialogue systems to operate adequately in real environments. We have analyzed various users' behaviors using data collected over 34 months from the Kyoto City Bus Information System. We focused on "barge-in'' and added barge-in rates to our analysis. Temporal transitions of users' behaviors, such as ASR accuracy, task success rates and barge-in rates, were initially investigated. We then examined the relationship between automatic speech recognition (ASR) accuracy and barge-in rates. Analysis revealed that the ASR accuracy of utterances inputted with barge-ins was lower because many novices, who were not accustomed to the timing when to utter, used the system. We also observed that the ASR accuracy of utterances with barge-ins differed based on the barge-in rates of individual users. The results indicate that the barge-in rate can be used as a novel user profile for detecting ASR errors.
VoicePedia: Towards Speech-based Access to Unstructured Information
J Sherwani, Carnegie Mellon University
Dong Yu, Microsoft Research
Tim Paek, Microsoft Research
Mary Czerwinski, Microsoft Research
Y C Ju, Microsoft Research
Alex Acero, Microsoft Research
Currently there are no dialog systems that enable purely voice-based access to the unstructured information on websites such as Wikipedia. Such systems could be revolutionary for non-literate users in the developing world. To investigate interface issues in such a system, we developed VoicePedia, a telephone-based dialog system for searching and browsing Wikipedia. In this paper, we present the system, as well as a user study comparing the use of VoicePedia to SmartPedia, a Smartphone GUI-based alternative. Keyword entry through the voice interface was significantly faster, while search result navigation, and page browsing were significantly slower. Although users preferred the GUI-based interface, task success rates between both systems were comparable - a promising result for regions where Smartphones and data plans are not viable.
Exploiting prosodic features for dialog act tagging in a discriminative modeling framework
Vivek Kumar Rangarajan Sridhar, University of Southern California
Srinivas Bangalore, AT&T Research Labs
Shrikanth Narayanan, University of Southern California
Cue-based automatic dialog act tagging uses lexical, syntactic and prosodic knowledge in the identification of dialog acts. In this paper, we propose a discriminative framework for automatic dialog act tagging using maximum entropy modeling. We propose two schemes for integrating prosody in our modeling framework: (i) Syntax-based categorical prosody prediction from an automatic prosody labeler, (ii) A novel method to model continuous acoustic-prosodic observation sequence as a discrete sequence through the means of quantization. The proposed prosodic feature integration results in a relative improvement of 11.8% over using lexical and syntactic features alone on the Switchboard-DAMSL corpus. The performance of using the lexical, syntactic and prosodic features results in an dialog act tagging accuracy of 84.1%, close to the human agreement of 84%.
Using Information State to Improve Dialogue Move Identification in a Spoken Dialogue System
Hua Ai, Intelligent Systems Program, University of Pittsburgh, USA
Antonio Roque, Institute for Creative Technologies, University of Southern California, USA
Anton Leuski, Institute for Creative Technologies, University of Southern California, USA
David Traum, Institute for Creative Technologies, University of Southern California, USA
In this paper we investigate how to improve the performance of a dialogue move and parameter tagger for a task-oriented dialogue system using the information state approach. We use a corpus of utterances from an implemented system to train and evaluate a tagger, and then evaluate the tagger in an on-line system. Use of information state context is shown to improve performance of the system.
Using Multiple Strategies to manage Spoken Dialogue
Shiu-Wah Chu, Queen's University of Belfast
Ian O'Neill, Queen's University of Belfast
Philip Hanna, Queen's University of Belfast
This paper describes the algorithm used by a multi-strategy dialogue manager (DM) for a speech-based dialogue system and presents the results of an evaluation of the new DM. From its different dialogue strategies the DM is capable of determining the most appropriate strategy based on a set of criteria and the corresponding algorithm. The DM can adopt a number of styles ranging from highly naturalistic mixed-initiative dialogues to rigid system-led dialogues, taking into account factors such as user experience and recognition conditions.
An Information State Based Dialogue Manager for a Mobile Robot
Marcelo Quinderé, Universidade de Aveiro / DETI/ IEETA
Luís Seabra Lopes, Universidade de Aveiro / DETI/ IEETA
António Teixeira, Universidade de Aveiro / DETI/ IEETA
The paper focuses on an Information State (IS) based dialogue manager developed for Carl, an intelligent mobile robot. It uses a Knowledge Acquisition and Management (KAM) module that integrates information obtained from various interlocutors. This mixed-initiative dialogue manager (DM) handles pronoun resolution, is capable of performing different kinds of clarification/confirmation questions and generates observations based on the current knowledge acquired. An evaluation of the DM on the knowledge acquisition goal is shown.