Objective Analysis of the Effect of Memory Inclusion on Bandwidth Extension of Narrowband Speech
Amr Nour-Eldin, Dept. of Electrical & Computer Eng., McGill University, Montreal, Canada
Peter Kabal, Dept. of Electrical & Computer Eng., McGill University, Montreal, Canada
For the purpose of improving Bandwidth Extension (BWE) of narrowband speech, we continue our recent work on the positive effect of exploiting the temporal correlation of speech on the dependence between speech frequency bands. We have shown that such memory inclusion into MFCC speech parametrization translates into higher highband certainty. In the work presented herein, we employ VQ to estimate highband discrete entropies, thus refining our analysis of the effect of memory inclusion on increasing highband certainty. Moreover, we extend our previous analysis to LSF parameters. We further construct a BWE system that exploits our memory inclusion technique, thus translating highband certainty gains into practical BWE performance improvement as measured by the objective quality of reconstructed speech. Results show that memory inclusion decreases the log-Spectral Distortion of the extended highband speech by as much as 1 dB corresponding to more than 14% relative.
Artificial Bandwidth Extension without Side Information for ITU-T G.729.1
Bernd Geiser, RWTH Aachen University, Germany
Hervé Taddei, Siemens Networks GmbH, Munich, Germany
Peter Vary, RWTH Aachen University, Germany
This paper discusses a potential extension of the ITU-T G.729.1 speech and audio codec. The G.729.1 coder is hierarchically organized, i.e., the obtained quality increases with the amount of bits that is received for each frame. In particular, the bit rates of 8 and 12 kbit/s offer narrowband (50Hz – 4 kHz) speech transmission. With a received bit rate of at least 14 kbit/s, the output bandwidth is increased to the wideband frequency range (50Hz – 7 kHz). Here, we investigate efficient methods to provide the full wideband frequency bandwidth already for the lower bit rates of 8 and 12 kbit/s while maintaining interoperability with the standard implementation of G.729.1. These techniques are not necessarily limited to G.729.1 and thus may serve in other applications as well.
The Effect of Highband Harmonic Structure in the Artificial Bandwidth Expansion of Telephone Speech
Hannu Pulakka, Laboratory of Acoustics and Audio Signal Processing, Helsinki University of Technology, Finland
Paavo Alku, Laboratory of Acoustics and Audio Signal Processing, Helsinki University of Technology, Finland
Laura Laaksonen, Nokia Technology Platforms, Finland
Päivi Valve, Nokia Technology Platforms, Finland
The quality of narrowband telephone speech can be improved by artificial bandwidth expansion (ABE), which generates missing frequency components above the telephone bandwidth using only information from the narrowband speech signal. Straightforward bandwidth expansion methods do not reproduce the harmonic structure of voiced sounds properly, but a pitch-adaptive technique can be used to approximate the correct alignment of harmonic frequencies. In this study, pitch-adaptive highband alignment was implemented into an existing ABE method, and the quality of the modified method was studied with formal listening tests in Finnish and Mandarin Chinese. The effect of the highband harmonic structure was found unimportant for the perceived speech quality. Consequently, computationally expensive pitch adaptation was found to be unnecessary for the bandwidth expansion of telephone speech.
Artificial Bandwidth Extension for Speech Signals using Speech Recogniton
Shingo Kuroiwa, The University of Tokushima
Masashi Takashina, The University of Tokushima
Satoru Tsuge, The University of Tokushima
Fuji Ren, The University of Tokushima
In this paper, we propose a non-realtime speech bandwidth extension method using HMM-based speech recognition and HMM-based speech synthesis. In the proposed method, first, the phoneme-state sequence is estimated from the bandlimited speech signals using the speech recognition technique. Next, for estimating spectrum envelopes of lost high-frequency components, an HMM-based speech synthesis technique generates a synthetic speech signal (spectrum sequence) according to the predicted phoneme-state sequence. Since both speech recognition and speech synthesis take into account dynamic feature vectors, we can obtain a smoothly varying spectrum sequence. For evaluating the proposed method, we conducted subjective and objective experiments. The experimental results show the effectiveness of the proposed method for bandwidth extension. However, the proposed method needs more improvement in speech quality.
Voicing-Based Codebook in Low-Rate Wideband CELP Coding
Driss Guerchi, College of Information Technology, UAE University
Tamer Rabie, College of Information Technology, UAE University
Louzi Abdelrhani, Ericsson Canada
In this paper we propose a new technique to quantize the spectral information in an algebraic code-excited linear prediction (ACELP) wideband codec. The Voicing-Based Vector Quantization (VBVQ) presented in this paper, optimizes the search of the optimum vector while reducing the codebook entries by almost one third. In the VBVQ training phase, three codebooks are individually designed for voiced, unvoiced and transition speech. The proposed technique reduces the processing delay since it restricts the quantization of an input vector to only one of the three codebooks. For each speech frame, one codebook is selected based on the interframe correlation of the spectral information. The VBVQ was successfully implemented in an ACELP wideband coder. The objective and subjective performance are superior to that of the combination of the split vector quantization and multistage vector quantization after using the same database for training and testing.
Performance of Speaker-Dependent Wideband Speech Coding
Ethan Duni, UC San Diego
Bhaskar Rao, UC San Diego
This paper examines the performance gains available in wideband speech coding using speaker-dependent systems. It is shown that a performance gain of 4 bits per frame, in the rate-distortion sense, is achievable in the LSF coding. While variations are evident in the pitch lag statistics during voiced frames, there is no gain to be had in unvoiced frames or in the adaptive gains; thus, there is little benefit to speaker-dependent coding of adaptive codebook parameters. Lastly, it was shown that gains of 40-50 bits per frame are available in the fixed excitation. These performance boosts can be exploited in a number of ways, most simply by reducing the operating rate. Alternatively, the complexity of the coding systems can be reduced while maintaining the same performance of speaker-independent coding. It was shown that a reduction in complexity by a factor of 4 is achievable using speaker-dependent LSF quantization.