Interspeech 2007 Session FrC.P1b: Systems for spoken language translation II
Friday, August 31, 2007
13:30 – 15:30
Gokhan Tur (SRI)
The BBN 2007 Displayless English/Iraqi Speech-to-Speech Translation System
David Stallard, BBN Technologies
Fred Choi, BBN Technologies
Prem Natarajan, BBN Technologies
Rohit Prasad, BBN Technologies
Shirin Saleem, BBN Technologies
Krishna Subramanian, BBN Technologies
Spoken communication across a language barrier is of increasing importance in both civilian and military applications. In this paper, we present an English/Iraqi Arabic speech-to-speech translation system for the military force protection domain (checkpoints, municipal services surveys, basic descriptions of people, houses, vehicles, etc). The system combines statistical N-gram speech recognition, statistical machine translation, hand-crafted rules, and speech synthesis in order to translate between the two languages. The system is designed for “eyes-free”, or “displayless” use. That is, it does not make use of a screen, mouse, or keyboard, but is instead operated by a handheld microphone with just two push buttons: one for English, and the other for Arabic
Context Dependent Word Modeling for Statistical Machine Translation using Part-of-Speech Tags
Ruhi Sarikaya, IBM T.J. Watson Research Center
Yonggang Deng, IBM T.J. Watson Research Center
Yuqing Gao, IBM T.J. Watson Research Center
Word based translation models in particular and phrase based translation models in general assume that a word in any context is equivalent to the same word in any other context. Yet, this is not always true. The words in a sentence are not generated independently. The usage of each word is strongly affected by its immediate neighboring words. The state-of-the-art machine translation (MT) methods use words and phrases as basic modeling units. This paper introduces Context Dependent Words (CDWs) as the new basic translation units. The context classes are defined using Part-of-Speech (POS) tags. Experimental results using CDW based language models demonstrate encouraging improvements in the translation quality for the translation of dialectal Arabic to English. Analysis of the results reveals that improvements are mainly in fluency.
Mapping from Literal Transcriptions of Conversational Speech to a more Standard Form of Linguistic Representation
Darren Scott Appling, Georgia Tech
Nick Campbell, NiCT/ATR
This paper describes the so-called ill-formed nature of spontaneous conversational speech as observed from the study of a 1500-hour corpus of recorded dialogue speech. We note that the structure is quite different from that of more formal speech or writing and propose a Statistical Machine Translation approach for mapping between the spoken and written forms of the language as if they were two entirely separate languages. We further posit that the particular nature of the spoken language is especially well suited for the display of affective states, inter-speaker relationships and discourse management information. In summary, both modes of communication appear to be particularly suited to their pragmatic function, neither is ill-formed, and it appears possible to map automatically between the two. This mapping has applications in speech technology for the processing of conversational speech.
Using inter-lingual triggers for machine translation
Caroline Lavecchia, LORIA UMR 7503
Kamel Smaïli, LORIA UMR 7503 University Nancy2
David Langlois, LORIA UMR 7503 IUFM of Lorraine
Jean-Paul Haton, LORIA UMR 7503 University Henri Poincaré Nancy1
In this paper, we present the idea of cross-lingual triggers. We exploit this formalism in order to build up a bilingual dictionary for machine translation. We describe the idea of cross-lingual triggers, the way to exploit and to make good use of them in order to produce a bilingual dictionary. We then compare it to ELRA and a free downloaded dictionaries. Finally, our dictionary is evaluated by comparing it to the one achieved by GIZA++ (which is an extension of the program GIZA) into an entire translation decoding process supplied by Pharaoh [Koehn04]. The experiments showed that the obtained dictionary is well constructed and is suitable for machine translation. The experiments have been conducted on a parallel corpus of 19 million French words and of 17 million English words. Finally, the encouraging results allow us to put forward the concept of cross-lingual triggers which could have so many applications in machine translation.
The IRST English-Spanish Translation System for European Parliament Speeches
Daniele Falavigna, FBK-irst
Nicola Bertoldi, FBK-irst
Fabio Brugnara, FBK-irst
Roldano Cattoni, FBK-irst
Mauro Cettolo, FBK-irst
Boxing Chen, FBK-irst
Marcello Federico, FBK-irst
Diego Giuliani, FBK-irst
Roberto Gretter, FBK-irst
Deepa Gupta, FBK-irst
Dino Seppi, FBK-irst
This paper presents the spoken language translation system developed at FBK-irst during the TC-STAR project. The system integrates automatic speech recognition with machine translation through the use of confusion networks, which permit to represent a huge number of transcription hypotheses generated by the speech recognizer. Confusion networks are efficiently decoded by a statistical machine translation system which computes the most probable translation in the target language. This paper presents the whole architecture developed for the translation of political speeches held at the European Parliament, from English to Spanish and vice versa, and at the Spanish Parliament, from Spanish to English.
The Influence of Utterance Chunking on Machine Translation Performance
Christian Fuegen, Universitaet Karlsruhe (TH)
Muntsin Kolss, Universitaet Karlsruhe (TH)
Speech translation systems commonly couple automatic speech recognition (ASR) and machine translation (MT) components. Hereby the automatic segmentation of the ASR output for the subsequent MT is critical for the overall performance. In simultaneous translation systems, which require a continuous output with a low latency, chunking of the ASR output into translatable segments is even more critical. This paper addresses the question how utterance chunking influences machine translation performance in an empirical study. In addition, the machine translation performance is also set in relation to the segment length produced by the chunking strategy, which is important for simultaneous translation. Therefore, we compare different chunking/ segmentation strategies on speech recognition hypotheses as well as on reference transcripts.
IraqComm: A Next Generation Translation System
Kristin Precoda, SRI International
Jing Zheng, SRI International
Dimitra Vergyri, SRI International
Horacio Franco, SRI International
Colleen Richey, SRI International
Andreas Kathol, SRI International
Sachin Kajarekar, SRI International
This paper describes the IraqCom(TM) translation system developed by SRI International, with components from Language Weaver, Inc., and Cepstral, LLC. IraqComm, supported by the DARPA Translation Systems for Tactical Use (TRANSTAC) program, mediates and translates spontaneous conversations between an English speaker and a speaker of colloquial Iraqi Arabic. It is trained to handle topics of tactical importance, including force protection, checkpoint operations, civil affairs, basic medical interviews, and training. Major components of the system include SRI's DynaSpeak(R) speech recognizer, Gemini, SRInterp, and Language Weaver translation engines, Cepstral's Swift speech synthesis engine, and the user interface. The system runs on standard Windows computers and can be used in a variety of modes, including an eyes-free, nearly hands-free mode.
Optimizing Sentence Segmentation for Spoken Language Translation
Sharath Rao Karikurve, interACT, Carnegie Mellon University
Ian Lane, interACT, Carnegie Mellon University
Tanja Schultz, interACT, Carnegie Mellon University
The conventional approach in text-based machine translation (MT) is to translate complete sentences, which are conveniently indicated by sentence boundary markers. However, since such boundary markers are not available for speech, new methods are required that define an optimal unit for translation. Our experimental results show that with a segment length optimized for a particular MT system, intra-sentence segmentation can improve translation performance (measured in BLEU) by up to 11% for Arabic Broadcast Conversation (BC) and 6% for Arabic Broadcast News (BN). We show that acoustic segmentation that minimizes Word Error Rate (WER) may not be give the best translation performance. We improve upon it by automatically resegmenting the ASR output in a way that is optimized for translation and argue that it might be necessary for different stages of a Spoken Language Translation (SLT) system to define their own optimal units.