Interspeech 2007 logo
August 27-31, 2007

Antwerp, Belgium
Antwerp Cathedral tower Antwerp main square Antwerp harbour in twilight
printer friendly version of this page

Interspeech 2007 Session ThD.P3a: Language modelling II

Type poster
Date Thursday, August 30, 2007
Time 16:00 – 18:00
Room Keurvels
Chair Murat Saraclar (Bogaziçi University)


Dynamic Language Model Adaptation Using Presentation Slides for Lecture Speech Recognition
Hiroki Yamazaki, Department of Computer Science, Tokyo Institute of Technology, Japan
Koji Iwano, Department of Computer Science, Tokyo Institute of Technology, Japan
Koichi Shinoda, Department of Computer Science, Tokyo Institute of Technology, Japan
Sadaoki Furui, Department of Computer Science, Tokyo Institute of Technology, Japan
Haruo Yokota, Department of Computer Science, Tokyo Institute of Technology, Japan

We propose a dynamic language model adaptation method that uses the temporal information from lecture slides for lecture speech recognition. The proposed method consists of two steps. First, the language model is adapted with the text information extracted from all the slides of a given lecture. Next, the text information of a given slide is extracted based on temporal information and used for local adaptation. Hence, the language model, used to recognize speech associated with the given slide changes dynamically from one slide to the next. We evaluated the proposed method with the speech data from four Japanese lecture courses. Our experiments show the effectiveness of our proposed method, especially for keyword detection. The F-measure error rate for lecture keywords was reduced by 2.4%.

Web-Based Language Modelling for Automatic Lecture Transcription
Cosmin Munteanu, Department of Computer Science, University of Toronto
Gerald Penn, Department of Computer Science / Knowledge Media Design Institute, University of Toronto
Ron Baecker, Department of Computer Science / Knowledge Media Design Institute, University of Toronto

Universities have long relied on written text to share knowledge. As more lectures are made available on-line, these must be accompanied by textual transcripts in order to provide the same access to information as textbooks. While Automatic Speech Recognition (ASR) is a cost-effective method to deliver transcriptions, its accuracy for lectures is not yet satisfactory. One approach for improving lecture ASR is to build smaller, topic-dependent Language Models (LMs) and combine them (through LM interpolation or hypothesis space combination) with general-purpose, large-vocabulary LMs. In this paper, we propose a simple solution for lecture ASR with similar or better Word Error Rate reductions (as well as topic-specific keyword identification accuracies) than combination-based approaches. Our method eliminates the need for two types of LMs by exploiting the lecture slides to collect a web corpus appropriate for modelling both the conversational and the topic-specific styles of lectures.

LSA-based Language Model Adaptation for Highly Inflected Languages
Tanel Alumäe, Institute of Cybernetics at Tallinn University of Technology
Toomas Kirt, Institute of Cybernetics at Tallinn University of Technology

This paper presents a language model topic adaptation framework for highly inflected languages. In such languages, subword units are used as basic units for language modeling. Since such units carry little semantic information, they are not very suitable for topic adaptation. We propose to lemmatize the corpus of training documents before constructing a latent topic model. To adapt language model, we use few lemmatized training sentences to find a set of documents that are semantically close to the current document. Fast marginal adaptation of subword trigram language model is used for adapting the background model. Experiments on a set of Estonian test texts show that the proposed approach gives a 19% decrease in language model perplexity. A statistically significant decrease in perplexity is observed already when using just two sentences for adaptation. We also show that the model employing lemmatization gives consistently better results than the unlemmatized model.

Language Model Adaptation Using Latent Dirichlet Allocation and an Efficient Topic Inference Algorithm
Aaron Heidel, National Taiwan University, Taiwan
Hung-an Chang, Massachusetts Institute of Technology
Lin-shan Lee, National Taiwan University, Taiwan

We present an effort to perform topic mixture-based language model adaptation using latent Dirichlet allocation (LDA). We use probabilistic latent semantic analysis (PLSA) to automatically cluster a heterogeneous training corpus, and train an LDA model using the resultant topic-document assignments. Using this LDA model, we then construct topic-specific corpora at the utterance level for interpolation with a background language model during language model adaptation. We also present a novel iterative algorithm for LDA topic inference. Very encouraging results were obtained in preliminary experiments with broadcast news in Mandarin Chinese.

Structural Bayesian Language Modeling and Adaptation
Sibel Yaman, Georgia Institute of Technology, USA
Jen-Tzung Chien, National Cheng Kung University, Taiwan
Chin-Hui Lee, Georgia Institute of Technology, USA

We propose a language modeling and adaptation framework using Bayesian structural maximum a posteriori(SMAP) principle, in which each n-gram event is embedded in a branch of a tree structure. The nodes in the first layer of this tree structure represent the unigrams, and those in the second layer represent the bigrams, and so on. Each node in the tree structure has an associated hyper-parameter representing the information about the prior distribution, and a count representing the number of times the word sequence occurs in the domain-specific data. In general, the hyper-parameters depend on the observation frequency of not only the node event but also its parent node of lower order n-gram event. Our automatic speech recognition experiments using the Wall Street Journal corpus verify that the proposed SMAP language model adaptation achieves a 5.6% relative improvement over maximum likelihood language models obtained with the same training and adaptation data sets.

Vocabulary Selection for a Broadcast News Transcription System using a Morpho-syntatic Approach
Ciro Martins, Department of Electronics and Telecommunications – Aveiro University, Portugal; L2F – Spoken Language Systems Lab – INESC-ID/IST, Lisbon, Portugal
António Teixeira, Department of Electronics and Telecommunications – Aveiro University, Aveiro, Portugal
João Neto, L2F – Spoken Language Systems Lab – INESC-ID/IST, Lisbon, Portugal

Although the vocabularies of ASR systems are designed to achieve high coverage for the expected domain, out-of-vocabulary (OOV) words cannot be avoided. Particularly, for daily and real-time transcription of Broadcast News (BN) data in highly inflected languages, the rapid vocabulary growth leads to high OOV word rates. To overcome this problem, we present a new morpho-syntatic approach to dynamically select the target vocabulary for this particular domain by trading off between the OOV word rate and vocabulary size. We evaluate this approach against the common selection strategy based on word frequency. Experiments have been carried out for a European Portuguese BN transcription system. Results computed on seven news shows, yields a relative reduction of 37.8% in OOV word rate against the baseline system and 5.5% when compared with the word frequency common approach.

Handling OOV Words In Arabic ASR Via Flexible Morphological Constraints
Nguyen Bach, InterACT, Language Technologies Institute, School of Computer Science, Carnegie Mellon University
Mohamed Noamany, InterACT, Language Technologies Institute, School of Computer Science, Carnegie Mellon University
Ian Lane, InterACT, Language Technologies Institute, School of Computer Science, Carnegie Mellon University
Tanja Schultz, InterACT, Language Technologies Institute, School of Computer Science, Carnegie Mellon University

We propose a novel framework to detect and recognize out-of-vocabulary (OOV) words in automated speech recognition (ASR). In the proposed framework a hybrid language model combining words and subword units is incorporated during ASR decoding then three different OOV words recognition methods are applied to generate OOV word hypotheses. Specifically, dictionary lookup, morphological composition, and direct phoneme-to-grapheme. The proposed approach successfully reduced WER by 1.9% and 1.6% for ASR systems with recognition vocabularies of 30K and 219K. Moreover, the proposed approach correctly recognized 5% of OOV words.

Phrases in Category-based Language Models for Spanish and Basque ASR
Raquel Justo, Universidad del País Vasco
M. Inés Torres, Universidad del País Vasco

In this work, we integrate phrases or segments of words into class n-gram language models in order to take advantage of two information sources: words and categories. Two different approaches to this kind of models are proposed and formulated. The models were integrated into an Automatic Speech Recognition system and subsequently evaluated in terms of word error rate. The experiments, carried out over two different databases and languages, demonstrate that a language model based on categories composed by phrases can outperform classical class n-gram language models.

Language Modeling for Automatic Turkish Broadcast News Transcription
Ebru Arisoy, Bogazici University
Hasim Sak, Bogazici University
Murat Saraclar, Bogazici University

The aim of this study is to develop a speech recognition system for Turkish broadcast news. State-of-the-art speech recognition systems utilize statistical models. A large amount of data is required to reliably estimate these models. For this study, a large Turkish Broadcast News database, consisting of the speech signal and corresponding transcriptions, is being collected. In this paper, information about this database and experiments performed using the system developed on the collected data are presented. In addition to the baseline system, various sub-word language models are investigated. Lexical stem-endings are proposed as a novel unit for language modeling and are shown to perform better than surface stem-endings and morphs. Currently, our best systems have lower than 20% error on clean speech.

ISCA logo Universiteit Antwerpen logo Radboud University Nijmegen logo Katholieke Universiteit Leuven logo