Interspeech 2007 Session FrB.P1b: New application areas
Type
poster
Date
Friday, August 31, 2007
Time
10:00 – 12:00
Room
Foyer
Chair
Sara Basson (IBM TJ Watson Research Center)
FrB.P1b‑1
How to Integrate Speech-Operated Internet Information Dialogs into a Car
André Berton, DaimlerChrysler AG
Peter Regel-Brietzmann, DaimlerChrysler AG
Hans-Ulrich Block, Siemens AG
Stefanie Schachtl, Siemens AG
Manfred Gehrke, Siemens AG
Telematics and entertainment systems in cars usually contain audio, phone and navigation functions that rely mostly on static content. The HMI, including the speech dialog system, reflects these static applications by their very restricted dialogs. In the future the driver would like to access current information from the internet to provide driving assistance and information of personal interest. The HMI should allow the user to request information by voice, in order to keep distraction minimal. This paper presents three architectural approaches to obtain current internet information in a car, link it to the existing applications, and make them available for speech input with as few hardware and software changes as possible to the existing system. We propose speech applications broadcast by digital radio, speech access to web services and a complete server-based processing architecture. The prototype dialog system for all three architectures was developed in the SmartWeb project.
FrB.P1b‑2
Recent Progress in the MIT Spoken Lecture Processing Project
James Glass, MIT Computer Science and Artificial Intelligence Laboratory
T.J. Hazen, MIT Computer Science and Artificial Intelligence Laboratory
Scott Cyphers, MIT Computer Science and Artificial Intelligence Laboratory
Igor Malioutov, MIT Computer Science and Artificial Intelligence Laboratory
David Huynh, MIT Computer Science and Artificial Intelligence Laboratory
Regina Barzilay, MIT Computer Science and Artificial Intelligence Laboratory
In this paper we discuss our research activities in the area of spoken lecture processing. Our goal is to improve the access to on-line audio/visual recordings of academic lectures by developing tools for the processing, transcription, indexing, segmentation, summarization, retrieval and browsing of this media. In this paper, we provide an overview of the technology components and systems that have been developed as part of this project, present some experimental results, and discuss our on-going and future research plans.
FrB.P1b‑3
How to Personalize Speech Applications for Web-based Information in a Car
Philipp Fischer, DaimlerChrysler
Andreas Oesterle, DaimlerChrysler
André Berton, DaimlerChrysler
Peter Regel-Brietzmann, DaimlerChrysler
We present a system for exploring and personalizing internet information in the car using natural language queries. Speech dialog applications are generated automatically from well-structured internet content, such as tables, and transferred to the car. In order to cope with the large variety of speech applications, we propose a hybrid content-based personalization approach. Speech applications are clustered into various topic areas by mapping them to a domain ontology. Applications are ranked according to explicit preferences of the driver, global profile data and an implicit user profile. This profile adapts itself while the user is interacting with the system and takes into account the selected application and the speech queries. The resulting list of ranked applications is displayed to the user. Global ratings are based on the preferred topics of 20 subjects that were also questioned about the prototype’s usability and accuracy in a first user evaluation.
FrB.P1b‑4
Topic Estimation with Domain Extensibility for Guiding User’s Out-of-Grammar Utterances in Multi-Domain Spoken Dialogue Systems
Satoshi Ikeda, Kyoto University
Kazunori Komatani, Kyoto University
Tetsuya Ogata, Kyoto University
Hiroshi G. Okuno, Kyoto University
In a multi-domain spoken dialogue system, a user’s utterances are more prone to be out-of-grammar, because this kind of system deals with more tasks than a single-domain system. We de- fined a topic as a domain about which users want to find more information, and we developed a method of recovering out-ofgrammar utterances based on topic estimation, i.e., by providing a help message in the estimated domain. Moreover, the domain extensibility, that is, to facilitate adding new domains, should be inherently retained in multi-domain systems. We therefore collected documents from the Web as training data for topic estimation. Because the data contained not a few noises, we used Latent Semantic Mapping (LSM), which enables robust topic estimation by removing the effect of noise from the data. The experimental results based on using 272 utterances collected with a Woz-like method showed that our method increased the topic estimation accuracy by 23.1 points from the baseline.
FrB.P1b‑5
Prosody Change and Response Timing Analysis in Spontaneously Spoken Dialogs and Their Modeling in a Spoken Dialog System
Ryota Nishimura, Department of Information and Computer Sciences, Toyohashi University of Technology
Norihide Kitaoka, Graduate School of Information Science, Nagoya University
Seiichi Nakagawa, Department of Information and Computer Sciences, Toyohashi University of Technology
If a dialog system were to respond to a user as naturally as a human, interaction would be smoother. Imitating the human prosodic behavior of utterances is important in human-human natural conversations. In this paper, to develop a cooperative/friendly spoken dialog system, we analyzed the correlations between F0 synchrony tendency or overlap frequency and subjective measures: "liveliness," "familiarity," and "informality" in human-human dialogs. We also modeled the properties of these features and implemented the model on our dialog system that generated the response timing of aizuchi (back-channel), turn-taking based on a decision tree in real time, and dynamical F0 changes to realize chat-like conversations.
FrB.P1b‑6
GEMSIS - a novel application of speech recognition to emergency and disaster medicine
Satoshi Tamura, Department of Information Science, Gifu University
Kunihiko Takamatsu, Department of Emergency & Disaster Medicine, Gifu University Graduate School of Medicine
Shinji Ogura, Department of Emergency & Disaster Medicine, Gifu University Graduate School of Medicine
Satoru Hayamizu, Department of Information Science, Gifu University
This paper reports an instance of novel application of speech recognition applied to emergency and disaster medicine. The emergency medical system named "GEMSIS"(Gifu Emergency Medical Supporting Intelligent System) including the speech recognition application is also introduced in this paper. Speech recognition plays an important role in this system; Then a paramedic team is sent to a disaster or accident site, a life-saving technician reports the situation using speech recognition in the site. The recognized information are shared by all hospitals and critical care centers. This system can solve the severe issue of the emergency medical care in which pre-hospital medical care is insufficient due to lack of information. A prototype application of speech recognition interface was constructed to evaluate a baseline performance and to make a discussion with medical doctors. Through this work, it is found that the applicable domain of speech processing technology can be extended.
FrB.P1b‑7
Application of Speech Technology in a Home Based Assessment Kiosk for Early Detection of Alzheimer’s Disease
Rachel Coulston, Center for Spoken Language Understanding
Esther Klabbers, Center for Spoken Language Understanding
Jacques de Villiers, Center for Spoken Language Understanding
John-Paul Hosom, Center for Spoken Language Understanding
Alzheimer's disease, a degenerative disease that affects an estimated 4.5 million people in the U.S., can be treated far more effectively when it is detected early. There are numerous challenges to early detection. One is objectivity, since caretakers are often emotionally invested in the health of the patients, who may be their family members. Consistency of administration can also be an issue, especially where longitudinal results from different examiners are compared. Finally, the frequency of testing can be adversely affected by scheduling or cost constraints for in-home psychometrician visits. The kiosk system described in this paper, currently deployed in homes around the country, uses speech technology to provide advantages that address these challenges.
FrB.P1b‑8
Ontology-Based Multimodal High Level Fusion Involving Natural Language Analysis for Aged People Home Care Application
Olga Vybornova, Communications and Remote Sensing Lab, Universite Catholique de Louvain, Belgium
Monica Gemo, Communications and Remote Sensing Lab, Universite Catholique de Louvain, Belgium
Ronald Moncarey, Communications and Remote Sensing Lab, Universite Catholique de Louvain, Belgium
Benoit Macq, Communications and Remote Sensing Lab, Universite Catholique de Louvain, Belgium
This paper presents a knowledge-based method of early-stage high level multimodal fusion of data obtained from speech input and visual scene. The ultimate goal is to develop a human-computer multimodal interface to assist elderly people living alone at home to perform their daily activities, and to support their active ageing and social cohesion. Crucial for multimodal high level fusion and successful communication is the provision of extensive semantics and contextual information from spoken language understanding. To address this we propose to extract natural language semantic representations and map them onto the restricted domain ontology. This information is then processed for multimodal reference resolution together with visual scene input. To make our approach flexible and widely applicable, a priori situational knowledge, modalities and the fusion process are modelled in the ontology expressing the domain constraints.