Interspeech 2007 Session ThB.O3: Spoken language understanding
Thursday, August 30, 2007
10:00 – 12:00
Roberto Pieraccini (SpeechCycle)
Generative and Discriminative Algorithms for Spoken Language Understanding
Christian Raymond, University of Trento
Giuseppe Riccardi, University of Trento
systems (SDS) aims at extracting concept and their relations from spontaneous speech. Previous approaches to SLU have modeled concept relations as stochastic semantic networks ranging from generative approach to discriminative. As spoken dialog systems complexity increases, SLU needs to perform understanding based on a richer set of features ranging from a-priori knowledge, long dependency, dialog history, system belief, etc. This paper studies generative and discriminative approaches to modeling the sentence segmentation and concept labeling. We evaluate algorithms based on Finite State Transducers (FST) as well as discriminative algorithms based on Support Vector Machine sequence classifier based and Conditional Random Fields (CRF). We compare them in terms of concept accuracy, generalization and robustness to annotation ambiguities. We also show how non-local non-lexical features (e.g. a-priori knowledge) can be modeled with CRF which is the best performing algorithm across tasks. The evaluation is carried out on two SLU tasks of different complexity, namely ATIS and Media corpora.
A Soft-Clustering Algorithm for Automatic Induction of Semantic Classes
Elias Iosif, Dept. of Electronics and Computer Engineering, Technical University of Crete, Chania, Greece
Alexandros Potamianos, Dept. of Electronics and Computer Engineering, Technical University of Crete, Chania, Greece
In this paper, we propose a soft-decision, unsupervised clustering algorithm that generates semantic classes automatically using the probability of class membership for each word, rather than deterministically assigning a word to a semantic class. Semantic classes are induced using an unsupervised, automatic procedure that uses a context-based similarity distance to measure semantic similarity between words. The proposed soft-decision algorithm is compared with various "hard" clustering algorithms, e.g., , and it is shown to improve semantic class induction performance in terms of both precision and recall for a travel reservation corpus. It is also shown that additional performance improvement is achieved by combining (auto-induced) semantic with lexical information to derive the semantic similarity distance.
Classification of Discourse Functions of Affirmative Words in Spoken Dialogue
Agustin Gravano, Columbia University
Stefan Benus, Brown University
Julia Hirschberg, Columbia University
Shira Mitchell, Harvard University
Ilia Vovsha, Columbia University
We present results of a series of machine learning experiments that address the classification of the discourse function of single affirmative cue words such as alright, okay and mm-hm in a spoken dialogue corpus. We suggest that a simple discourse/sentential distinction is not sufficient for such words and propose two additional classification sub-tasks: identifying (a) whether such words convey acknowledgment or agreement, and (b) whether they cue the beginning or end of a discourse segment. We also study the classification of each individual word into its most common discourse functions. We show that models based on contextual features extracted from the time-aligned transcripts approach the error rate of trained human aligners.
Conditional use of Word Lattices, Confusion Networks and 1-best string hypotheses in a Sequential Interpretation Strategy
Bogdan Minescu, France Telecom R&D, France
Géraldine Damnati, France Telecom R&D, France
Fréderic Béchet, University of Avignon, France
Renato De Mori, University of Avignon, France
Within the context of a deployed spoken dialog service, this study presents a new interpretation strategy based on the sequential use of different ASR output representations: 1-best strings, word lattices and confusion networks. The goal is to reject as early as possible in the decoding process the non-relevant messages containing non-speech or out-of-domain content. This is done through the 1-pass of the ASR decoding process thanks to specific acoustic and language models. A confusion network (CN) is then calculated for the remaining messages and another rejection process is applied with the confidence measures obtained in the CN. The messages kept at this stage are considered relevant; therefore the search for the best interpretation is applied to a richer search space than just the 1-best word string: either the whole CN or the whole word lattice. An improved, SLU oriented, CN generation algorithm is also proposed that significantly reduces the size of the CN obtained while improving the recognition performance. This strategy is evaluated on a large corpus of real users’ messages obtained from a deployed service.
Speaker Adaptation of Language Models for Automatic Dialog Act Segmentation of Meetings
Jachym Kolar, University of West Bohemia in Pilsen
Yang Liu, University of Texas at Dallas
Elizabeth Shriberg, SRI International & International Computer Science Institute
Dialog act (DA) segmentation in meeting speech is important for meeting understanding. In this paper, we explore speaker adaptation of hidden event language models (LMs) for DA segmentation using the ICSI Meeting Corpus. Speaker adaptation is performed using a linear combination of the generic speaker-independent LM and an LM trained on only the data from individual speakers. We test the method on 20 frequent speakers, on both reference word transcripts and the output of automatic speech recognition. Results indicate improvements for 17 speakers on reference transcripts, and for 15 speakers on automatic transcripts. Overall, the speaker-adapted LM yields statistically significant improvement over the baseline LM for both test conditions.
Unsupervised Categorisation Approaches for Technical Support Automated Agents
Amparo Albalate, Institute of Information Technology, University of Ulm, Germany
Roberto Pieraccini, SpeechCycle, New York, NY 10001, USa
Dimitar Dimitrov, Institute of Information Technology, University of Ulm, Germany
In this paper we describe an unsupervised approach for the automated categorisation of utterances into predefined categories of symptoms (or problems) within the framework of a technical support automated agent. The utterance classification is performed based on an iterative K-means clustering method. In order to improve the lower accuracy typical of unsupervised algorithms, we have analysed two different enhancements of the classification algorithm. The first method exploits the affinity among words by automatically extracting classes of semantically equivalent terms. The second approach consists of a disambiguation technique based on a new criterion to estimate the relevance of terms for the classification. An analysis of the results of an experimental evaluation performed on a corpus of 34848 utterances concludes the paper.