Interspeech 2007 Session ThD.SS: Novel techniques for the NATO non-native air-traffic control and HIWIRE cockpit databases
Thursday, August 30, 2007
16:00 – 18:00
Astrid Scala 1
David van Leeuwen (TNO Human Factors)
More detailed information about this session can be found here.
Design and characterization of the Non-native Military Air Traffic Communications database (nnMATC)
Stephane Pigeon, Royal Military School
Wade Shen, MIT Lincoln Laboratory
Aaron Lawson, Air Force Research Laboratory
David van Leeuwen, TNO Human Factors
This paper describes the speech database that has a central role in the Interspeech 2007 special session "Novel techniques for the NATO non-native Air Traffic Communications database." The rationale for recording and distributing this common research object is given, and details about the acquisition and annotation are given, as well as some statistics. Further, a summary is given of potential uses of the database, in terms of evaluation measures and protocols.
A Comparison of Speaker Clustering and Speech Recognition Techniques for Air Situational Awareness
Wade Shen, MIT/Lincoln Laboratory
Douglas Reynolds, MIT/Lincoln Laboratory
In this paper we compare speaker clustering and speech recognition techniques to the problem of understanding patterns of air traffic control communications. For a given radio transmission, our goal is to identify the talker and to whom he/she is speaking. This information, in combination with knowledge of the the roles (i.e. takeoff, approach, hand-off, taxi, etc.) of different radio frequencies within an air traffic control region could allow tracking of pilots through various stages of flight, thus providing the potential to monitor the airspace in great detail. Both techniques must contend with degraded audio channels and significant non-native accents. We report results from experiments using the nn-MATC database [Pigeon07] showing 9.3% and 32.6% clustering error for speaker clustering and ASR methods respectively.
Advanced Front-end for Robust Speech Recognition in Extremely Adverse Environments
Dimitrios Dimitriadis, National Technical University of Athens, School of ECE
Jose C. Segura, Dpto. Teoria de la Senal, Telematica y Comunicaciones (TSTC), Univ. Granada
Luz Garcia, Dpto. Teoria de la Senal, Telematica y Comunicaciones (TSTC), Univ. Granada
Vassilis Pitsikalis, National Technical University of Athens, School of ECE
Petros Maragos, National Technical University of Athens, School of ECE
Alexandros Potamianos, Dept. of ECE, Technical University of Crete
In this paper, a unified approach to speech enhancement, feature extraction and feature normalization for speech recognition in adverse recording conditions is presented. The proposed front-end system consists of several different, independent, processing modules. Each of the algorithms contained in these modules has been independently applied to the problem of speech recognition in noise, significantly improving the recognition rates. In this work, these algorithms are merged in a single front-end and their combined performance is demonstrated. The advanced front-end is applied to extremely adverse environments where most feature extraction schemes fail. We show that by combining speech enhancement, robust feature extraction and feature normalization up to a fivefold error rate reduction can be achieved for certain tasks.
Experiments on Hiwire database using Denoising and Adaptation with an hybrid HMM-ANN Model
Roberto Gemello, Loquendo
Franco Mana, Loquendo
Scanzio Stefano, Politecnico di Torino
This paper presents the results of a large number of experiments performed on the Hiwire cockpit database with a hybrid HMM-ANN speech recognition model . The Hiwire database is a noisy and non-native English speech corpus for cockpit communication. The noisy component of the database has been used to test two noise reduction methods recently introduced, while the adaptation component is exploited to perform supervised and unsupervised adaptation of the HMM-ANN model with an innovative technology, both in multi-speaker and speaker dependent way. Baseline results are presented, and the improvements obtained with noise reduction and adaptations are reported, showing an error reduction of about 60%.
Detection and Removal of Switching Noise in Push-to-Talk (PTT) and Voice Operated eXchange (VOX) Communications Systems
Brett Smolenski, Research Associates for Defense Conversion
This paper addresses the detection and removal of key clicks in the NATO non-native Air Traffic Control database. Key clicks are impulse-like noises generated in Push-to-Talk (PTT) and Voice Operated eXchange (VOX) communications systems. The removal of key clicks can improve the quality of the signal for both listening and machine processing. The detection of key clicks could also assist in other applications, such as speech segmentation on these types of channels. The approach taken was to first apply a Recursive Least Squares (RLS) whitening filter to augment the key clicks. Outlier detection based on order statistics was then applied to detect the candidate key click samples. Finally, sample values extending to the second zero crossing prior to a detected key click and to the third zero crossing following a detected key click were set to zero to remove the key clicks. With this approach 98.3% detection of the key clicks with only 0.19% false alarms was obtained.
Evaluation of the Combined Use of MEMLIN and MLLR on the Non-native Adaptation Task of Hiwire Project Database
Luis Buera, University of Zaragoza, Spain
Antonio Miguel, University of Zaragoza, Spain
Oscar Saz, University of Zaragoza, Spain
Eduardo Lleida, University of Zaragoza, Spain
Alfonso Ortega, University of Zaragoza, Spain
This paper describes the performance of the combination of Multi-Environment Model-based LInear Normalization, MEMLIN, which provides an estimation of the uncorrupted feature vector, with Maximum Likelihood Linear Regression, MLLR, for the collected database under the auspices of the IST-EU STREP project HIWIRE. In this work the results for the non-native adaptation task (NNA) are presented. The HIWIRE project database consist on command and control aeronautics application utterances pronounced by non-native speakers which are digitally corrupted with airplane cockpit noise. Thus, three noise conditions are defined: low, medium and high noise. In the proposed system, each MEMLIN-ormalized feature vector is decoded using the MLLR-adapted acoustic models. The experiments show that an important improvement is reached combining MEMLIN and MLLR methods for all kinds of non-native speakers and noise conditions.