Interspeech 2007 Session TuD.O2: Spoken data retrieval I
Tuesday, August 28, 2007
16:00 – 18:00
Tim Anderson (Air Force Research Laboratory)
Rapid and Accurate Spoken Term Detection
David Miller, BBN Technologies
Michael Kleber, BBN Technologies
Chia-lin Kao, BBN Technologies
Owen Kimball, BBN Technologies
Thomas Colthurst, BBN Technologies
Stephen Lowe, BBN Technologies
Richard Schwartz, BBN Technologies
Herbert Gish, BBN Technologies
We present a state-of-the-art system for performing spoken term detection on continuous telephone speech in multiple languages. The system compiles a search index from deep word lattices generated by a large-vocabulary HMM speech recognizer. It estimates word posteriors from the lattices and uses them to compute a detection threshold that minimizes the expected value of a user-specified cost function. The system accommodates search terms outside the vocabulary of the speech-to-text engine by using approximate string matching on induced phonetic transcripts. Its search index occupies less than 1Mb per hour of processed speech and it supports sub-second search times for a corpus of hundreds of hours of audio. This system had the highest reported accuracy on the telephone speech portion of the 2006 NIST Spoken Term Detection evaluation, achieving 83% of the maximum possible accuracy score in English.
Subword-based Position Specific Posterior Lattices (S-PSPL) for Indexing Speech Information
Yi-cheng Pan, National Taiwan University
Hung-lin Chang, National Taiwan University
Berlin Chen, National Taiwan Normal University
Lin-shan Lee, National Taiwan University
Position Specific Posterior Lattices (PSPL) have been recently proposed as very powerful, compact structures for indexing speech. In this paper, we take PSPL one step further to Subword-based Position Specific Posterior Lattices (S-PSPL). As with PSPL, we include posterior probabilities and proximity information, but we base this information on subword units rather than words. The advantages of S-PSPL over PSPL mainly come from rare and/or OOV words, which may be included in S-PSPL but generally are not in PSPL. Experiments on Mandarin Chinese broadcast news showed significant improvements from S-PSPL as compared to PSPL. Such advantages are believed to be language independent.
Improved Methods for Language Model Based Question Classification
Andreas Merkel, Spoken Language Systems, Saarland University, Germany
Dietrich Klakow, Spoken Language Systems, Saarland University, Germany
In this paper, we propose a language model based approach to classify user questions in the context of question answering systems. As categorization paradigm, a Bayes classifier is used to determine a corresponding semantic class. We present experiments with state-of-the-art smoothing methods as well as with some improved language models. Our results indicate that the techniques proposed here provide performance superior to the standard methods, including support vector machines.
Error-Tolerant Question Answering for Spoken Documents
Tomoyosi Akiba, Toyohashi University of Technology
Hirofumi Tsujimura, Toyohashi University of Technology
This paper proposes an error-tolerant question answering method for spoken documents. Though the question answering system for written documents can be directly applied to the transcribed spoken documents by using a LVCSR system, the recognition errors significantly degrade the QA performance. Especially, it is often the case that the answer itself is miss-recognized and in that case it becomes quite difficult to find the answer. To cope with such a problem, instead of conventional NE extraction, the proposed method utilizes named entity detection that decides only whether a section of speech, i.e. an utterance, contains named entities of a specific type. Because the NE detection is much easier task and utilized wider context than the NE extraction, it is expected to work robustly for erroneous transcribed speech data. The experimental results showed that the proposed method outperformed the baseline methods with respect to the spoken document with recognition errors.
Exploiting Information Extraction Annotations for Document Retrieval in Distillation Tasks
Dilek Hakkani-Tür, ICSI
Gokhan Tur, SRI
Michael Levit, ICSI
Information distillation aims to extract relevant pieces of information related to a given query from massive, possibly multilingual, audio and textual document sources. In this paper, we present our approach for using information extraction annotations to augment document retrieval for distillation. We take advantage of the fact that some of the distillation queries can be associated with annotation elements introduced for the NIST Automatic Content Extraction (ACE) task. We experimentally show that using the ACE events to constrain the document set returned by an information retrieval engine significantly improves the precision at various recall rates for two different query templates.
Learning Spoken Document Similarity and Recommendation using Supervised Probabilistic Latent Semantic Analysis
Kishan Thambiratnam, Microsoft Research Asia
Frank Seide, Microsoft Research Asia
This paper presents a model-based approach to spoken document similarity called Supervised Probabilistic Latent Semantic Analysis (PLSA). The method differs from traditional spoken document similarity techniques in that it allows similarity to be learned rather than approximated. The ability to learn similarity is desirable in applications such as Internet video recommendation, in which complex relationships like user-preference or speaking style need to be predicted. The proposed method exploits prior knowledge of document relationships to learn similarity. Experiments on broadcast news and Internet video corpora yielded 16.2% and 9.7% absolute mAP gains over traditional PLSA. Additionally, a cascaded Supervised+Discriminative PLSA system achieved a 3.0% absolute mAP gain over a Discriminative PLSA system, demonstrating the complementary nature of Supervised and Discriminative PLSA training.