Yegnanarayana Bayya (International Institute of Information Tecnology Hyderabad)
TuD.P3a‑1
A Conservative Aggresssive Subspace Tracker
Koby Crammer, University of Pennsylvania
The need to track a subspace describing well a stream of points arises in many signal processing applications. In this work, we present a very efficient algorithm using a machine learning approach, which its goal is to de-noise the stream of input points. The algorithm guarantees the orthonormality of the representation it uses. We demonstrate the merits of our approach using simulations.
TuD.P3a‑2
Mutual information and the speech signal
Mattias Nilsson, Sound and Image Processing Laboratory, KTH School of Electrical Engineering, Stockholm
W. Bastiaan Kleijn, Sound and Image Processing Laboratory, KTH School of Electrical Engineering, Stockholm
Mutual information is commonly used in speech processing in the context of statistical mapping. Examples are the optimization of speech or speaker recognition algorithms, the computation of performance bounds on such algorithms, and bandwidth extension of narrow-band speech signals. It is generally ignored that speech-signal derived data usually have an intrinsic dimensionality that is lower than the dimensionality of the observation vectors (the dimensionality of the embedding space). In this paper, we show that such reduced dimensionality can affect the accuracy of the mutual information estimate significantly. We introduce a new method that removes the effects of singular probability density functions. The method does not require prior knowledge of the intrinsic dimensionality of the data. It is shown that the method is appropriate for speech-derived data.
TuD.P3a‑3
Spectro-Temporal Analysis of Speech Using 2-D Gabor Filters
Tony Ezzat, MIT
Jake Bouvrie, MIT
Tomaso Poggio, MIT
We present a 2-D spectro-temporal Gabor filterbank based on the 2-D Fast Fourier Transform, and show how it may be used to analyze localized patches of a spectrogram. We argue that the 2-D Gabor filterbank has the capacity to decompose a patch into its underlying dominant spectro-temporal components, and we illustrate the response of our filterbank to different speech phenomena such as harmonicity, formants, vertical onsets/offsets, noise, and overlapping simultaneous speakers.
TuD.P3a‑4
A comparative study of speech rate estimation techniques
Tomas Dekens, Vrije Universiteit Brussel, dept. ETRO-DSSP, Pleinlaan 2, B-1050 Brussels, Belgium
Mike Demol, Vrije Universiteit Brussel, dept. ETRO-DSSP, Pleinlaan 2, B-1050 Brussels, Belgium
Werner Verhelst, Vrije Universiteit Brussel, dept. ETRO-DSSP, Pleinlaan 2, B-1050 Brussels, Belgium
Piet Verhoeve, Corporate R&D dept., TELEVIC nv, Leo Bekaertlaan 1, B-8870 Izegem, Belgium
In this paper we evaluate the performance of 8 different speech rate estimators previously described in the literature by applying them on a multilingual test database. All the estimators show an underestimation at high speech rates and some also suffer from an overestimation at low speech rates. Overall the tested methods obtain high correlation coefficients with the reference speech rate. The Temporal Correlation and Selected Sub-band Correlation method (tcssbc), which uses sub-band and time domain correlation for detecting the number of vowels or diphthongs present in the speech signal, shows little errors and appears to be the most appropriate overall technique for speech rate estimation.
TuD.P3a‑5
Spectro-Temporal Processing for Blind Estimation of Reverberation Time and Single-Ended Quality Measurement of Reverberant Speech
Tiago H. Falk, Queen's University
Hua Yuan, Queen's University
Wai-Yip Chan, Queen's University
Auditory spectro-temporal representations of reverberant speech are investigated for blind estimation of reverberation time (RT) and for single-ended measurement of speech quality. The auditory representations are obtained from an eight-filter filterbank which is used to extract the modulation spectra from temporal envelopes of the speech signal. Gaussian mixture models (GMM), one for each modulation channel and trained on clean speech signals, serve as reference models of normative speech behavior. Consistency measures, computed between reverberant test signals and each GMM, are mapped to an estimated RT and to an estimated quality score. Experiments show that the proposed measures achieve superior performance relative to current "state-of-art" algorithms.