Single Channel Speech Separation Using Maximum a Posteriori Estimation
Mohammad H. Radfar, Department of Systems and Computer Engineering, Carleton, Ottawa, Canada
Richard M. Dansereau, Department of Systems and Computer Engineering, Carleton, Ottawa, Canada
We present a new approach for separating two speech signals when only a single recording of their additive mixture is available. In this approach, log spectra of the sources are estimated using maximum a posteriori estimation given the mixture's log spectrum and the probability density functions of the sources. It is shown that the estimation leads to a two-state, non-linear filter whose states are controlled by the means of the sources. The first state of the filter is expressed using a combination of two Wiener filters whose parameters are controlled by the means and variances of the sources and noise variance and the second state is expressed by the means of the sources. Through the experiments, conducted on a wide variety of mixtures, we show that the MAP based estimator outperforms the methods which use binary mask filtering or Wiener filtering for the separation task.
Speech Enhancement with Improved A Posteriori SNR Computation
Suhadi Suhadi, Institute for Communications Technology, Braunschweig Technical University
Tim Fingscheidt, Institute for Communications Technology, Braunschweig Technical University
In speech enhancement, the decision-directed (DD) approach to compute the a priori SNR is often used to reduce the musical tones. However, the constant DD weighting factor very close to one results in more speech distortion during transitional speech segments. Contrarily, a time-varying weighting factor gives less speech distortion but with more residual noise in speech pause. In this contribution we present a new a posteriori SNR computation to relax the dependence on the decision-directed weighting factor. By computing the a posteriori SNR with a time-varying weighting factor, we actually derive a correction factor to the time-varying DD weighting factor resulting in less speech distortion during transitions, as well as less residual noise in speech pause.
Method of LP-based blind restoration for improving intelligibility of bone-conducted speech
Thang Vu Tat, JAIST
Germine Seide, JAIST
Unoki Masashi, JAIST
Akagi Masato, JAIST
Bone-conducted (BC) speech in an extremely noisy environment is stable against surrounding noise so that it may be able to be used instead of air-conducted (AC) speech for communication. However, it has very poor sound quality and its intelligibility is degraded when transmitted through bone conduction. We proposed an LP-based model to restore BC speech to improve its voice-quality in a previous study. In this papers, we improved the proposed model by (1) extending long-term processing to frame-basis processing, (2) using LSF coefficients on LP representation, and (3) using a recurrent neural network for predicting parameters. We evaluated the improved model in comparison with other models to find out whether the model could adequately improve voice quality and the intelligibility of BC speech, using objective measures (LSD, MCD, and LCD) and carrying out Modified Rhyme Tests (MRTs).
Noise Suppression Based on Extending a Speech-Dominated Modulation Band
Tiago H. Falk, Queen's University
Svante Stadler, Royal Institute of Technology (KTH)
W. Bastiaan Kleijn, Royal Institute of Technology (KTH)
Wai-Yip Chan, Queen's University
Previous work on bandpass modulation filtering for noise suppression has resulted in unwanted perceptual artifacts and decreased speech clarity. Artifacts are introduced mainly due to half-wave rectification, which is employed to correct for negative power spectral values resultant from the filtering process. In this paper, modulation frequency estimation (i.e., bandwidth extension) is used to improve perceptual quality. Experiments demonstrate that speech-component lowpass modulation content can be reliably estimated from bandpass modulation content of speech-plus-noise components. Subjective listening tests corroborate that improved quality is attained when the removed speech lowpass modulation content is compensated for by the estimate.
Speech Enhancement Using PCA and Variance of the Reconstruction Error Model Identification
Amin Haji Abolhassani, INRS-Energie-Materiaux-Telecommunications, Montreal, Canada
Sid-Ahmed Selouani, Universite de Moncton, Campus de Shippagan, Canada
Douglas O'Shaughnessy, INRS-Energie-Materiaux-Telecommunications, Montreal, Canada
Mohamed-Faouzi Harkat, Universite Badji Mokhtar, Faculte des Sciences de l'Ingenieur, Annaba, Algerie
We present in this paper a subspace approach for enhancing a noisy speech signal. The original algorithm for model identification from which we have derived our method has been used in the field of fault detection and diagnosis. This algorithm is based on principal component analysis in which the optimal subspace selection is provided by a variance of the reconstruction error (VRE) criterion. This choice overcomes many limitations encountered with other selection criteria, like overestimation of the signal subspace or the need for empirical parameters. We have also extended our subspace algorithm to take into account the case of colored and babble noise. The performance evaluation, which is made on the Aurora database shows that our method provides a higher noise reduction and a lower signal distortion than existing enhancement methods. Our algorithm succeeds in enhancing the noisy speech in all noisy conditions without introducing artifacts such as "musical noise".
Speech Reinforcement based on Partial Specific Loudness
Jong Won Shin, School of Electrical Engineering and INMC, Seoul National University, Seoul, Korea
Woohyung Lim, School of Electrical Engineering and INMC, Seoul National University, Seoul, Korea
Junesig Sung, School of Electrical Engineering and INMC, Seoul National University, Seoul, Korea
Nam Soo Kim, School of Electrical Engineering and INMC, Seoul National University, Seoul, Korea
In the presence of background noise, the perceptual loudness of speech signal significantly decreases resulting in the deterioration of intelligibility and clarity. In this paper, we propose a novel approach to enhance the quality of speech signal when the additive noise cannot be directly controlled. Specifically, we propose an approach which reinforces the speech signal so that the partial loudness in each band can be maintained to the level almost the same to that measured without the effect of background noise. To find a suitable reinforcement rule, the loudness perception model proposed by Moore et al.  is adopted. Experimental results show that the loudness of the original noise-free speech can be restored by the proposed reinforcement algorithm and the proposed algorithm can enhance the perceived quality of speech signal under various noise environments.