Interspeech 2007 Session ThB.P1a: Pitch extraction I
Type
poster
Date
Thursday, August 30, 2007
Time
10:00 – 12:00
Room
Foyer
Chair
Isabel Trancoso (INESC, Lisboa)
ThB.P1a‑1
Joint Position-Pitch Extraction from Multichannel Audio
Michael Wohlmayr, SPSC Laboratory, Graz University of Technology, Graz, Austria
Marian Kepesi, SPSC Laboratory, Graz University of Technology, Graz, Austria
Recently, a method for joint extraction of pitch and location information from two-channel recordings has been introduced. This framework offers a new, natural representation of all acoustic sources in the auditory scene, and has potential to be used as front-end in applications such as advanced tracking of multiple speakers in conference rooms. In this paper, we explore basic properties of this method and propose improvements in performance by using circular arrangements of multiple microphones.
ThB.P1a‑2
Morphological pre-processing technique and its applications on speech signal
HyunSoo Kim, Samsung Electronics.
The properties and applications of morphological filters for speech analysis are investigated. We introduce and investigate a novel nonlinear spectral envelope estimation method based on morphological operations, which is found to be very robust against noise. This method is also compared with the spectral envelope estimation vocoder (SEEVOC) method. A simple method for the optimum selection of the structuring set size without using pitch information is proposed. Also, a new concept of higher order peaks is introduced and found to be beneficial. The morphological approach is then used for a new pitch estimation method. The harmonic-plus-noise decomposition is used to develop a novel and flexible noise reduction method.
ThB.P1a‑3
A Pitch Extraction System Based in Phase Locked Loops and Consensus Decision
Patricia Pelle, School of Engineering, University of Buenos Aires, Argentina
Claudio Estienne, School of Engineering, University of Buenos Aires, Argentina
In this work a very low error rate pitch estimation system is presented, which is also very robust against noise. Two key aspects of the system are mainly responsible of such good behavior: on the one hand we use a multiple estimation scheme based on PLL's. These devices provide us with robust information about the period of the speech signal harmonics. By combining this information with an additional independent estimation it is possible to obtain a robust estimation of f0. On the other hand, multiple estimations are combined in a stage that assesses each of them, retaining the more reliable ones. A final agreement value between these qualified estimations is the final result of the system. This consensus decision significantly improves the initial estimation accuracy. Overall performance is assessed by comparing our system to the get_f0 algorithm, under clean and noisy conditions. We show that our system outperforms get_f0 over all presented conditions.
ThB.P1a‑4
A Robust Multi-Phase Pitch-Mark Detection Algorithm
Milan Legat, University of West Bohemia, Pilsen, Czech Rep.
Jindrich Matousek, University of West Bohemia, Pilsen, Czech Rep.
Daniel Tihelka, University of West Bohemia, Pilsen, Czech Rep.
This paper describes a robust multi-phase algorithm for marking of pitch pulses in speech using both glottal and speech signals. In the first phase, the glottal signal is used for the estimation of the fundamental frequency contour of the given sentence. Next, pitch mark candidates are generated on the basis of both glottal and speech signals. In the third phase, the best sequence of pitch marks is found in the set of the candidates. Finally, this pitch mark sequence is post-processed. One of the features of the new method is that every pitch mark detected is given confidence, so that problematic pitch mark subsequences can be located. The algorithm was tested and compared with other pitch-mark detection methods.
ThB.P1a‑5
Pitch Estimation of Noisy Speech Signals using Empirical Mode Decomposition
Md. Khademul Islam Molla, The University of Tokyo
Keikichi Hirose, The University of Tokyo
Nobuaki Minematsu, The University of Tokyo
Md. Kamrul Hasan, Bangladesh University of Engineering and Technology
This paper presents a pitch estimation method of noisy speech signal using empirical mode decomposition (EMD). The normalized autocorrelation function (NACF) of the noisy speech signal is decomposed into a finite set of band-limited signals termed as intrinsic mode functions (IMFs) using EMD. One IMF component has the periodicity equal to the accurate pitch period. A conventional autocorrelation based pitch period detection method is used to select the IMF with pitch period. The accurate pitch period is obtained from the selected IMF. The pitch estimation performance in term of gross pitch error (GPE) of the proposed algorithm is compared with recently proposed methods. The experimental results show that the EMD based algorithm performs better in pitch estimation of noisy speech.
ThB.P1a‑6
Evaluating two versions of the Momel pitch modelling algorithm on a corpus of read speech in Korean
Daniel Hirst, LPL CNRS Aix-Marseille Université, Aix-en-Provence
Hyongsil Cho, LPL CNRS Université de Provence
Sunhee Kim, Center for Humanities and Information, Seoul National University, Seoul
Hyunji Yu, Department of Linguistics, Seoul National University, Seoul
The Momel algorithm provides an automatic factoring of raw fundamental frequency into two components: a microprosodic component, corresponding to local variations of pitch caused by the phonetic nature of the speech segments and a macroprosodic component corresponding to the overall pitch pattern of the utterance which is then represented as a sequence of pitch targets. An earlier evaluation estimated the overall efficiency of the algorithm (F-measure) at around 95% on a corpus of read speech for 5 European languages and at around 93% for a corpus of spontaneous speech. In this paper we present the results of the evaluation of the output of two versions of the Momel algorithm as compared with manually corrected pitch targets for a corpus of just over 2 hours of read speech in Korean (40 continuous 5-sentence passages, each read by 5 male and 5 female speakers). The results show that the new version of the Momel algorithm performs systematically better than the earlier version.
ThB.P1a‑7
Hybrid Electroglottograph and Speech Signal based Algorithm for Pitch Marking
Hussein Hussein, Dresden University of Technology
Oliver Jokisch, Dresden University of Technology
Pitch marking is very significant in speech signal processing. In a text-to-speech (TTS) system based on the Time Domain Pitch-Synchronous Overlap-Add (TD-PSOLA) method, robust estimation of pitch marks (PM) is especially important to the modification of the time and pitch scale of a speech signal in order to match it to that of the target speaker. The aim of this paper is to improve the accuracy of automatic Pitch Mark Algorithms (PMA). Therefore, we propose a hybrid method for pitch marking that combines the advantages of the Electroglottograph (EGG) and the speech signals. We evaluate this hybrid algorithm for pitch marking against pitch mark algorithm used by Praat program. The results of the evaluation indicate that the suggested method provides better performance than PMA based on EGG signal or speech signal.