Interspeech 2007 logo
August 27-31, 2007

Antwerp, Belgium
Antwerp Cathedral tower Antwerp main square Antwerp harbour in twilight
printer friendly version of this page

Interspeech 2007 Session ThC.SS: Multilingualism in speech and language processing


Type special
Date Thursday, August 30, 2007
Time 13:30 – 15:30
Room Astrid Scala 1
Chair Jan Verhasselt (Tele Atlas)

More detailed information about this session can be found here.

ThC.SS‑1

SPICE: Web-based Tools for Rapid Language Adaptation in Speech Processing Systems
Tanja Schultz, Carnegie Mellon University
Alan W Black, Carnegie Mellon University
Sameer Badaskar, Carnegie Mellon University
Matthew Hornyak, Carnegie Mellon University
John Kominek, Carnegie Mellon University

In this paper we describe the design and implementation of a user interface for SPICE, a web-based toolkit for rapid prototyping of speech and language processing components. We report on the challenges and experiences gathered from testing these tools in an advanced graduate hands-on course, in which we created speech recognition, speech synthesis, and smalldomain translation components for 10 different languages within only 6 weeks.
ThC.SS‑2

Introduction to Multilingual Corpus-Based Concatenative Speech Synthesis
Filip Deprez, Nuance Communications International, Belgium
Jan Odijk, Nuance Communications International, Belgium
Jan De Moortel, Nuance Communications International, Belgium

This tutorial paper addresses foreign-language support in corpus-based concatenative text-to-speech systems. We give an overview of application domains where strictly monolingual speech synthesis is not sufficient and where multilingual text-to-speech is required or highly desirable. We describe two approaches to multilingual corpus-based speech synthesis: phoneme mapping on the one hand, and the creation of multilingual speech databases on the other. We list the strengths and weaknesses of both approaches.
ThC.SS‑3

Recognition of foreign names spoken by native speakers
Frederik Stouten, Ghent University
Jean-Pierre Martens, Ghent University

It is a challenge to develop a speech recognizer that can handle the kind of lexicons encountered in an automatic attendant or car navigation application. Such lexicons can contain several 100K entries, mainly proper names. Many of these names are of a foreign origin, and native speakers can pronounce them in different ways, ranging from a completely nativized to a completely foreignized pronunciation. In this paper we propose a method that tries to deal with the observed pronunciation variability by introducing the concept of a foreignizable phoneme, and by combining standard acoustic models with a phonologically inspired back-off acoustic model. The main advantage of the approach is that it does not require any foreign phoneme models nor foreign speech data. For the recognition of English names by means of Dutch acoustic models, we obtained a reduction of the word error rate by more than 10% relative.
ThC.SS‑4

Language Identification using several sources of information with a multiple-Gaussian classifier
Ricardo Cordoba, Speech Technology Group. Dept. of Electronic Engineering. Universidad Politécnica de Madrid
Luis F. D'Haro, Speech Technology Group. Dept. of Electronic Engineering. Universidad Politécnica de Madrid
Fernando Fernandez-Martinez, Speech Technology Group. Dept. of Electronic Engineering. Universidad Politécnica de Madrid
Juan M. Montero, Speech Technology Group. Dept. of Electronic Engineering. Universidad Politécnica de Madrid
Roberto Barra, Speech Technology Group. Dept. of Electronic Engineering. Universidad Politécnica de Madrid

We present several innovative techniques that can be applied in a PPRLM system for language identification (LID). To normalize the scores, eliminate the bias in the scores and improve the classifier, we compared the bias removal technique (up to 19% relative improvement (RI)) and a Gaussian classifier (up to 37% RI). Then, we include additional sources of information in different feature vectors of the Gaussian classifier: the sentence acoustic score (11% RI), the average acoustic score for each phoneme (11% RI), and the average duration for each phoneme (7.8% RI). The use of a multiple-Gaussian classifier with 4 feature vectors meant an additional 15.1% RI. Using 4 feature vectors instead of just PPRLM provides a 26.1% RI. Finally, we include additional acoustic HMMs of the same language with success (10% relative improvement). We will show how all these improvements have been mostly additive.
ThC.SS‑5

Dynamic Language Change in MIMUS
Carmen Del Solar, Julietta Research Group, University of Seville, Seville, Spain
Guillermo Pérez, Julietta Research Group, University of Seville, Seville, Spain
Eva Florencio, Julietta Research Group, University of Seville, Seville, Spain
David Moral, Julietta Research Group, University of Seville, Seville, Spain
Gabriel Amores, Julietta Research Group, University of Seville, Seville, Spain
Pilar Manchón, Julietta Research Group, University of Seville, Seville, Spain

One of the most widely pursued goals in dialogue system development is the improvement of usability, which is mainly achieved by providing users with both friendly and manageable interfaces. The MIMUS dialogue system supports multimodal interactions, allowing the user to interact not only verbally but also graphically. In addition to this, MIMUS (MultIModal, University of Seville) is a multilingual system that enables the user to communicate (dynamically) in Spanish and English. The present paper describes the MIMUS architecture, the components that entail direct relationship with multilinguality, and the way in which languages can be dynamically switched within a single dialogue.

ISCA logo Universiteit Antwerpen logo Radboud University Nijmegen logo Katholieke Universiteit Leuven logo