Interspeech 2007 Session FrB.SS: Machine learning for spoken dialogue systems
Friday, August 31, 2007
10:00 – 12:00
Astrid Scala 1
Oliver Lemon (Edinburgh University), Olivier Pietquin (Ecole Superieure d'Electricite, Metz Campus - IMS Research Group)
More detailed information about this session can be found here.
Machine Learning for Spoken Dialogue Systems
Oliver Lemon, Edinburgh University
Olivier Pietquin, Ecole Superieure d'Electricite, Metz Campus - IMS Research Group
This is an introductory tutorial paper for the Special Session on Machine Learning in Spoken Dialogue Systems. During the last decade, research in the field of Spoken Dialogue Systems (SDS) has experienced increasing growth. Yet, the design and optimization of SDS does not only consist of combining speech and language processing systems. It also requires the development of dialogue strategies taking at least into account the performances of these subsystems (and others), the nature of the task (e.g. form filling, tutoring, robot control, or database search/browsing), and the user's behaviour (e.g. cooperativeness, expertise). Due to the great variability of these factors, reuse of previous hand-crafted designs is also made very difficult. For these reasons, statistical machine learning (ML) methods applied to automatic SDS optimization have recently been a leading research area. In this paper, we provide a short review of the field and of recent advances.
Learning dialogue strategies for interactive database search
Verena Rieser, International Graduate College, Saarland/Edinburgh University
Oliver Lemon, School of Informatics, University of Edinburgh
We show how to learn optimal dialogue policies for a wide range of database search applications, concerning how many database search results to present to the user, and when to present them. We use Reinforcement Learning methods for a wide spectrum of different database simulations, turn penalty conditions, and noise conditions. Our objective is to show that our policy learning framework covers this spectrum. We can show that even for challenging cases learning significantly outperforms hand-coded policies tailored to the different operating situations. The polices are adaptive/context-sensitive in respect of both the overall operating situation (e.g. noise) and the local context of the interaction (e.g. user's last move). The learned policies produce an average relative increase in reward of 25.7% over the corresponding threshold-based hand-coded baseline policies.
Hierarchical Dialogue Optimization Using Semi-Markov Decision Processes
Heriberto Cuayáhuitl, University of Edinburgh
Steve Renals, University of Edinburgh
Oliver Lemon, University of Edinburgh
Hiroshi Shimodaira, University of Edinburgh
This paper addresses the problem of dialogue optimization on large search spaces. For such a purpose, in this paper we propose to learn dialogue strategies using multiple Semi-Markov Decision Processes and hierarchical reinforcement learning. This approach factorizes state variables and actions in order to learn a hierarchy of policies. Our experiments are based on a six-slot flight booking dialogue system and compare flat versus hierarchical reinforcement learning. Experimental results show that the proposed approach produced a dramatic search space reduction (99.36%), and converged four orders of magnitude faster than flat reinforcement learning with a very small loss in optimality (on average 0.3 system turns). Results also report that the learnt policies outperformed a hand-crafted one under three different conditions of ASR confidence levels. This approach is appealing to dialogue optimization due to faster learning, reusable subsolutions, and scalability to larger problems.
Knowledge Consistent User Simulations for Dialog Systems
Hua Ai, University of Pittsburgh
Diane Litman, University of Pittsburgh
We propose a novel model to simulate user knowledge consistency in tutoring dialogs, where no clear user goal can be defined. We also propose a new evaluation measure of knowledge consistency based on learning curves. We compare our new simulation model to real users as well as to a previously used simulation model. We show that the new model performs similarly to the real students and to the previous model when evaluated on high-level dialog features. The new model outperforms the previous model when measured on knowledge consistency.
Reducing Recognition Error Rate based on Context Relationships among Dialogue Turns
Hsu-Chih Wu, Industrial Technology Research Institute, Hsinchu, Taiwan
Stephanie Seneff, MIT CSAIL Laboratory, Cambridge, MA, USA
We have recently been conducting research on developing spoken dialogue systems to provide conversational practice for a learner of a foreign language. One of the most critical aspects of such a system is speech recognition errors, since they often take the dialogue thread down a wrong turn that is very confusing to the student and may be irrecoverable. In this paper we report on a machine learning technique to assist the process of selection from a list of N-best candidates based on a high-level description of the semantics of the preceding dialogue. In a user simulation experiment, we show that a significant reduction in sentence error rate can be achieved, from 29.2% to 23.6%. We have not yet verified that our techniques hold for real user data.
Bayes Risk-based Optimization of Dialogue Management for Document Retrieval System with Speech Interface
Teruhisa Misu, Kyoto University
Tatsuya Kawahara, Kyoto University
We propose an efficient dialogue management for an information navigation system based on a document knowledge base. It is expected that incorporation of appropriate N-best candidates of ASR and contextual information will improve the system performance. The system also has several choices in generating responses or confirmations. In this paper, this selection is optimized as minimization of Bayes risk based on reward for correct information presentation and penalty for redundant turns. We have evaluated this strategy with our spoken dialogue system "Dialogue Navigator for Kyoto City", which also has question-answering capability. Effectiveness of the proposed framework was confirmed in the success rate of retrieval and the average number of turns for information access.