Interspeech 2007 Session WeB.O1: Language modelling I
Wednesday, August 29, 2007
10:00 – 12:00
Hermann Ney (RWTH Aachen)
Large-Scale Random Forest Language Models for Speech Recognition
Yi Su, Center for Language and Speech Processing, Department of Electrical and Computer Engineering, The Johns Hopkins University, Baltimore, Maryland, USA
Frederick Jelinek, Center for Language and Speech Processing, Department of Electrical and Computer Engineering, The Johns Hopkins University, Baltimore, Maryland, USA
Sanjeev Khudanpur, Center for Language and Speech Processing, Department of Electrical and Computer Engineering, The Johns Hopkins University, Baltimore, Maryland, USA
The random forest language model (RFLM) has shown encouraging results in several automatic speech recognition (ASR) tasks but has been hindered by practical limitations, notably the space-complexity of RFLM estimation from large amounts of data. This paper addresses large-scale training and testing of the RFLM via an efficient disk-swapping strategy that exploits the recursive structure of a binary decision tree and the local access property of the tree-growing algorithm, redeeming the full potential of the RFLM, and opening avenues of further research, including useful comparisons with $n$-gram models. Benefits of this strategy are demonstrated by perplexity reduction and lattice rescoring experiments using a state-of-the-art ASR system.
PLSA-based Topic Detection in Meetings for Adaptation of Lexicon and Language Model
Yuya Akita, Kyoto University
Yusuke Nemoto, Kyoto University
Tatsuya Kawahara, Kyoto University
A topic detection approach based on a probabilistic framework is proposed to realize topic adaptation of speech recognition systems for long speech archives such as meetings. Since topics in such speech are not clearly defined unlike news stories, we adopt a probabilistic representation of topics based on probabilistic latent semantic analysis (PLSA). A topical sub-space is constructed by PLSA, and speech segments are projected to the sub-space, then each segment is represented by a vector which consists of topic probabilities obtained by the projection. Topic detection is performed by clustering these vectors, and topic adaptation is done by collecting relevant texts based on the similarity in this probabilistic representation. In experimental evaluations, the proposed approach demonstrated significant reduction of perplexity and out-of-vocabulary rates as well as robustness against ASR errors.
Language Modeling using PLSA-Based Topic HMM
Atsushi Sako, Kobe University
Tetsuya Takiguchi, Kobe University
Yasuo Ariki, Kobe University
In this paper, we propose a PLSA-based language model for sports live speech. This model is implemented in unigram rescaling technique that combines a topic model and an N-gram. In conventional method, unigram rescaling is performed with a topic distribution estimated from a history of recognized transcription. This method can improve the performance; however it cannot express topic transition. Incorporating concept of topic transition, it is expected to improve the recognition performance. Thus the proposed method employs a "Topic HMM" instead of a history to estimate the topic distribution. The Topic HMM is a Discrete Ergodic HMM that expresses typical topic distributions and topic transition probabilities. Word accuracy results indicate an improvement over tri-gram and PLSA-based conventional method using a recognized history.
Lexicon Adaptation with Reduced Character Error (LARCE) - A New Direction in Chinese Language Modeling
Yi-cheng Pan, National Taiwan University
Lin-shan Lee, National Taiwan University
Good language modeling relies on good predefined lexicons. For Chinese, since there are no text word boundaries and the concept of "word" is not very well defined, constructing good lexicons is difficult. In this paper, we propose lexicon adaptation with reduced character error (LARCE), which learns new word tokens based on the criterion of reduced adaptation corpus error rate. In this approach, a multi-character string is taken as a new "word" as long as it is helpful in reducing the error rate, and minimum number of new, high-quality words can be obtained. This algorithm is based on character-based consensus networks. In initial experiments on Chinese broadcast news, it is shown that LARCE not only significantly outperforms PAT-tree-based word extraction algorithms, but even outperforms manually augmented lexicons. It is believed the concept is equally useful for other character-based languages.
Minimum Rank Error Training for Language Modeling
Meng-Sung Wu, Department of Computer Science and Information Engineering, National Cheng Kung University
Jen-Tzung Chien, Department of Computer Science and Information Engineering, National Cheng Kung University
Discriminative training techniques have been successfully developed for many pattern recognition applications. In speech recognition, discriminative training aims to minimize the metric of word error rate. However, in an information retrieval system, the best performance should be achieved by maximizing the average precision. In this paper, we construct the discriminative n-gram language model for information retrieval following the metric of minimum rank error (MRE) rather than the conventional metric of minimum classification error. In the optimization procedure, we maximize the average precision and estimate the language model towards attaining the smallest ranking loss. In the experiments on ad-hoc retrieval using TREC collections, the proposed MRE language model performs better than the maximum likelihood and the minimum classification error language models.
Integrating MAP, Marginals, and Unsupervised Language Model Adaptation
Wen Wang, SRI International
Andreas Stolcke, SRI International
We investigate the integration of various language model adaptation approaches for a cross-genre adaptation task to improve Mandarin ASR system performance on a recently introduced new genre, broadcast conversation (BC). Various language model adaptation strategies are investigated and their efficacies are evaluated based on ASR performance, including unsupervised language model adaptation from ASR transcripts and ways to integrate supervised Maximum A Posteriori (MAP) and marginal adaptation within the unsupervised adaptation framework. We found that by effectively combining these adaptation approaches, we can achieve as much as 1.3% absolute gain (6% relative) on the final recognition error rate in the BC genre.