Interspeech 2007 logo
August 27-31, 2007

Antwerp, Belgium
Antwerp Cathedral tower Antwerp main square Antwerp harbour in twilight
printer friendly version of this page

DEMO Booth

INTERSPEECH 2007 is providing a dedicated DEMO BOOTH in the exhibition area where researchers from academia and industry can demonstrate research results and/or early prototypes in the field of speech and language technologies. Each demo will receive a 2-hour time slot (synchronized with the regular conference schedule). The schedule of the demo program can be found below and is also announced in the DEMO BOOTH.

In the DEMO BOOTH power and wireless internet access will be available. All further electronic equipment needs to be supplied by the demonstrator.

Tuesday Aug 28
13:30-15:30
Affective Multimodal Mirror
Willem Melder, Khiet Truong, David van Leeuwen, Mark Neerincx (TNO Human Factors), Marten den Uyl (VicarVision), B. Stock Plum (V2), Lodewijk R. Loos (Waag Society)

Affective mirror (AM) is a demonstration that senses and elicits laughter. Currently, the mirror is based on a vocal and a facial affect-sensing system that are fused to achieve a user-state assessment. This intelligent interface employs affect-sensing technologies and user-interaction designs that enable a full cycle of sensing, interpreting, reacting to and causing new effects. The intention of the mirror is to elicit positive emotions, to make people laugh and to increase the laughter. The first user experiences tests showed that users show cooperative behavior, resulting in user-mirror action-reaction cycles. Most users enjoyed the interaction with the mirror and immerged in an excellent user experience.
Tuesday Aug 28
16:00-18:00
TravelMan - a Multimodal Interface for Mobile Spoken Route Guidance
Markku Turunen, Jaakko Hakulinen (University Tampere)

TravelMan is a multimodal interface for mobile route guidance developed at the University of Tampere. TravelMan provides route guidance information for public transport, such as metro, tram, and bus traffic in Finnish cities. In addition, information for long-distance traffic is included. There are two main functions: (1) planning a journey and (2) interactive guidance during the journey. TravelMan supports pedestrian guidance when the user is changing between the means of transportation. The range of input and output modalities include speech synthesis, speech recognition, a fisheye GUI, haptics, contextual text input, physical browsing, physical gestures, non-speech audio, and global positioning information.
Wednesday Aug 29
10:00-12:00
VOCALOID - Commercial singing synthesizer based on sample concatenation
Kenmochi Hideki (Yamaha)

VOCALOID is a commercial singing synthesis software.  During this demonstratation, people will be able to input notes and lyrics and get a singing synthesis result easily using the software. Visitors can try the software by themselves easily.
The paper introducing the software is submitted to the Special Session "Synthesis of singing challenge" (TuC.SS): "VOCALOID - Commercial singing synthesizer based on sample concatenation".
Wednesday Aug 29
13:30-15:30
SPICE: Speech Processing Interactive Creation and Evaluation project
Tanja Schultz (KarlsRuhe University)

Speech technology potentially allows everyone to participate in today's information revolution and can bridge the language barrier gap. With some 6900 languages in the world, traditionally speech processing is prohibitive to all but the most economically viable languages. In spite of recent improvements in speech processing, supporting new languages is a skilled job requiring significant effort from trained individuals. The project SPICE (*S*peech *P*rocessing - *I*nteractive *C*reation and *E*valuation) aims to overcome both limitations by providing an interactive completely web-based language creation and evaluation toolkit that allows everyone to develop speech processing models, to collect appropriate data for model building, and to evaluate the results enabling iterative improvements.
Wednesday Aug 29
16:00-18:00
Multimodal Health and Fitness Companions
Markku Turunen, Jaakko Hakulinen (University Tampere)

First results of the COMPANIONS FP6 project will be demonstrated. A Companion is a software agent with which the user can converse, ask questions, get information and be prompted for actions or responses. Here, an experimental spoken dialogue system with virtual and physical software agents is demonstrated in the health and fitness domain. The demonstration uses Wi-Fi enabled Nabaztag rabbits to interact with the users.
The demo shows how several Nabaztags can be used in spoken dialogue systems using the open source software developed at the University of Tampere.
Thursday Aug 30
10:00-12:00
The Inseparable Sides of a Coin: Developing SpeechIndexer for Researching Vanishing Languages and Cultures
Ulrike Glavitsch (ETH Zurich), Jozsef Szakos (National Dong Hwa University)

This demo presents the SpeechIndexer software implemented on the C#/.NET platform to transcribe and index recorded speech. The software follows a novel approach for language documentation and retrieval. Speech documents are indexed and transcribed in such a way that navigation and searching is easily possible. This particularly allows investigating endangered languages that have a pure oral tradition. SpeechIndexer’s central component is a pause finder whose results form the basis for the efficient, semi-automatic indexing and transcribing process. The software is very compact (300 KB) and runs on custom hardware.
Thursday Aug 30
13:30-15:30
On the influence of vocal tract geometry on articulatory control strategies, acoustic properties and their respective variability in vowel production
Susanne Fuchs (ZAS, Berlin)

Even in laboratory speech using nonsense words, speaker-specific articulatory behaviour is often found. The question arises where inter-speaker variability stems from. One possibility is that it is related to socio-cultural, linguistic factors, while another possibility links this variability to inter-speaker differences in the properties of the vocal apparatus. Disentangling these factors is a challenging and methodologically difficult task. A first step in this direction is to gather experimental data for a larger set of speakers and compare them with realistic physical models of their vocal tract. The overall aim of this work which will be carried out in the future is to investigate the relations between individual morphological differences of the vocal tract and the corresponding speaker-specific articulatory control strategies and acoustic properties.
Thursday Aug 30
16:00-18:00
Automatic post-synchronization for dialog replacement
Werner Verhelst and Pieter Soens (Vrije Universiteit Brussel)

In movie or video recording, especially in outdoors situations, the dialog recordings are often of poor quality (because of street noise etc) and have to be re-recorded in a recording studio. Whenever the speaker's lips are visible, this dialog replacement track must be synchronized with the lip movements. A software system will be demonstrated that allows for the automatic time alignment of the studio recording with the recording that was originally made on the recording set. The system comprises the measurement of existing timing differences between the two recordings, decision logic that derives the time scaling that has to be applied, and our WSOLA time-scaling algorithm.
Friday Aug 31
10:00-12:00
Spanish speech to sign language translation system
Luis Fernando D'haro (UPM)

This demonstration presents a Spanish Speech to sign language translation system. The developed system is focused on the sentences spoken by an officer when assisting people who apply for, or renew the Identity Card. The system translates officer explanations into Spanish Sign Language (LSE: Lengua de Signos Española) for Deaf people. The translation system is composed of a speech recognizer (for decoding the spoken utterance into a word sequence), a natural language translator (for converting a word sequence into a sequence of signs belonging to the sign language), and a 3D avatar animation module (for playing back the signs).
Friday Aug 31
13:30-15:30
IraqComm™ spontaneous two-way interactive translation system
Murat Akbacak, Horacio Franco, Sachin Kajarekar, Dimitra Vergyri, Wen Wang, Jing Zheng (SRI)

The IraqComm™ spontaneous two-way interactive translation system will be demonstrated. IraqComm mediates and translates spontaneous conversations between an English speaker and a speaker of colloquial Iraqi Arabic. It is trained to handle topics of tactical importance, including force protection, checkpoint operations, civil affairs, basic medical interviews, and training. Major components of the system include SRI's DynaSpeak® speech recognizer, Gemini, SRInterp, and Language Weaver translation engines, Cepstral's Swift speech synthesis engine, and the user interface. The system runs on standard Windows computers and can be used in a variety of modes, including an eyes-free, nearly hands-free mode.


ISCA logo Universiteit Antwerpen logo Radboud University Nijmegen logo Katholieke Universiteit Leuven logo