Interspeech 2007 Session WeC.SS: Objective assessment of voice and speech quality
Type
poster
Date
Wednesday, August 29, 2007
Time
13:30 – 15:30
Room
Astrid Scala 1
Chair
Yannis Stylianou (University of Crete), Hugo Van hamme (K.U.Leuven, Belgium)
More detailed information about this session can be found here.
WeC.SS‑1
Women's Vocal Aging: a Longitudinal Approach
Markus Brückl, Technische Universität Berlin
A quasi-experimental longitudinal paired-samples study was carried out to explore, whether aging for 5 years can (1) audibly and (2) measurably change women's vocalisations, and if so, on which acoustic information (3) the listeners’ performance possibly could relay on and (4) which parameters can contribute to detect the chronological difference. Results indicate that (1) listeners can significantly correctly judge this difference based on sustained /i/ and /u/ vowels, but much better based on (spontaneous) speech samples. (2) Parameters depicting pitch, vowel resonance, voice perturbations, tremor and spectral energy distributions differ (significantly) between chronologically and perceptive younger and older samples. (3) Listeners tend to judge vowel samples as older, if increased (amplitude) perturbations can be measured, but in speech samples there seem to be overriding and objectively more reliable features. (4) The most reliable age-indicating measures in this study in speech samples are durations/ tempo measures – in vowels F0 and tremor.
WeC.SS‑2
Effect of Intensive Voice Therapy on Vocal Tremor for Parkinson Speakers
Laurence Cnockaert, Laboratoire d'Images, Signaux et Dispositifs de Télécommunications, Université Libre de Bruxelles, Belgique
Jean Schoentgen, Laboratoire d'Images, Signaux et Dispositifs de Télécommunications, Université Libre de Bruxelles, Belgique
Canan Ozsancak, Service de Neurologie, CH de Boulogne sur Mer, France
Pascal Auzou, Service d'Explorations Fonctionnelles Neurologiques, Groupe Hopale, Berck-sur-Mer, France
Francis Grenez, Laboratoire d'Images, Signaux et Dispositifs de Télécommunications, Université Libre de Bruxelles, Belgique
The effect of intensive voice therapy (Lee Silverman Voice Treatment, LSVT) on vocal tremor features of Parkinson speakers is presented. Vocal tremor is the low-frequency variation of the vocal frequency. Its features differ for Parkinson and normophonic speakers. Here, vocal tremor features have been estimated for a corpus of speakers with Parkinson's disease, recorded before and after intensive voice therapy. Results show that the treatment has significant effects on vocal tremor amplitude: Vocal tremor amplitude has decreased right after treatment. After six month, it has increased again, but is still lower than before treatment.
WeC.SS‑3
Assessment of vocal dysperiodicities in connected disordered speech
Ali Alpan, Laboratoire d’Images, Signaux et Dispositifs de Télécommunications,Université Libre de Bruxelles, Brussels, Belgium
Abdellah Kacha, Laboratoire d’Images, Signaux et Dispositifs de Télécommunications,Université Libre de Bruxelles, Brussels, Belgium
Francis Grenez, Laboratoire d’Images, Signaux et Dispositifs de Télécommunications,Université Libre de Bruxelles, Brussels, Belgium
Jean Schoentgen, Laboratoire d’Images, Signaux et Dispositifs de Télécommunications,Université Libre de Bruxelles, Brussels, Belgium
The aim of the presentation is to investigate acoustic analysis of connected speech by means of an average-equalized and energy-equalized variogram to extract vocal dysperiodicities. The variogram enables positioning a current and a lagged analysis frame in adjacent speech cycles to track inter-cycle dysperiodicities. Average and energy equalization of the analysis frames are options that make it possible to compensate for slow deterministic changes of the speech signal amplitude in connected speech. The instantaneous dysperiodicity trace has been summarized by means of segmental and global signal-to-dysperiodicity ratios. Results show that signal-to-dysperiodicity ratios obtained by variogram analysis correlate strongly with the perceived degree of hoarseness when the analysis frames are energy-equalized. Equalizing the frame averages removes small artifacts in the instantaneous dysperiodicity trace that are caused by sound-to-sound transients or intrusive low-frequency noise.
WeC.SS‑4
Effects of FE Modelled Consequences of Tonsillectomy on Perceptual Evaluation of Voice
Anne-Maria Laukkanen, Department of Speech Communication and Voice Research, University of Tampere, Finland
Jaromír Horáček, Institute of Thermomechanics, Academy of Sciences of the Czech Republic, Prague, Czech Republic
Pavel Švancara, Institute of Thermomechanics, Academy of Sciences of the Czech Republic, Prague, Czech Republic
Elina Lehtinen, Department of Speech Communication and Voice Research, University of Tampere, Finland
This study aimed to investigate the effects of a tonsillectomy on perceived overall voice quality and timbre. Computer simulations of five Czech vowels were made, both including the calculated resonance effects of large tonsils (size 1.6 cm3) and without tonsils. The simulations were made using a finite element model of the vocal tract, based on magnetic resonance images. Size and shape of the tonsils were obtained from clinical data. The generated pressure outputs were transformed into sound records presented to 10 trained listeners. Formant frequencies of the simulated samples were measured. The samples with and without tonsils did not differ significantly from each other for voice quality. F3 was significantly lower and the timbre was darker without tonsils. Thus, the effects of tonsillectomy on voice may be perceivable, at least when large tonsils are concerned. The effect, however, may disappear in time due to changes in the tissue and due to compensatory changes in articulation.
WeC.SS‑5
Speech quality after major surgery of the oral cavity and oropharynx with microvascular soft tissue reconstruction
Irma Verdonck-de Leeuw, VU University Medical Center
Louis ten Bosch, Radboud University, Nijmegen
Li Ying Chao, Radboud University, Nijmegen
Rico Rinkel, VU University Medical Center
Pepijn Borggreven, VU University Medical Center
Lou Boves, Radboud University, Nijmegen
Rene Leemans, VU University Medical Center
Speech quality of patients with oral or oropharyngeal carcinoma was assessed by perceptual and acoustic-phonetic analyses. Speech recordings of running speech of patients before and 6 and 12 months after treatment for oral or oropharyngeal cancer and of 18 control speakers were evaluated regarding intelligibility, nasality and articulation, which revealed deteriorated speech in 20% of the patients before treatment, and in 75% 6-12 months after treatment. Acoustic analyses comprised formant, duration, perturbation and noise measures of the vowels /i/, /a/, and /u/ and were performed on the speech samples 6 months after treatment and the controls. Patients appeared to have a smaller vowel space compared to controls, which was clearly related to speech intelligibility. Furthermore, voice perturbation appeared to be higher in patients. The presented speech analyses may serve as part of an outcome measurement protocol for assessing efficacy of speech rehabilitation.
WeC.SS‑6
Voice fatigue and use of speech recognition: A study of voice quality ratings
Christel de Bruijn, University of Central England, Birmingham, UK
Sandra Whiteside, University of Sheffield, UK
Previous studies have suggested the use of speech recognition software may be related to the development of voice problems. The aim of this study is to investigate the effects of using such software on perceptual voice quality. In particular, the variables type of speech recognition (discrete and continuous) and vocal load of a speaker are considered. One of the most consistent results was a rise in pitch, a common finding in voice fatigue studies. It is interpreted as part of a hyperfunctional mechanism countering early signs of voice fatigue.
WeC.SS‑7
Complementary approaches for voice disorder assessment
Jean-François Bonastre, LIA - University of Avignon
Corinne Fredouille, LIA - University of Avignon
Alain Ghio, LPL, CNRS, Aix-Marseille University
Antoine Giovanni, LAPEC, Aix-Marseille University
Gilles Pouchoulin, LIA - University of Avignon
Joana Révis, LAPEC, Aix-Marseille University
Bernard Teston, LPL, CNRS, Aix-Marseille University
Ping Yu, LAPEC, Aix-Marseille University
This paper describes 2 comparative studies of voice quality assessment based on complementary approaches.The first study was undertaken on 449 speakers whose voice quality was evaluated in parallel by a perceptual judgment and objective measurements.Results showed that a non-linear combination of 7 parameters allowed the classification of 82% voice samples in the same grade as the jury.The second study relates to the adaptation of ASR techniques to pathological voice assessment.The system designed for this particular task relies on a GMM based approach.Experiments conducted on 80 female voices provide promising results.We benefit from the multiplicity of theses techniques to evaluate the methodological situation which points fundamental differences between these complementary approaches (bottom-up/top-down, global/analytic).We also discuss theoretical aspects about relationship between acoustic measurement and perceptual mechanisms which are often forgotten in the performance race.
WeC.SS‑8
Frequency Study for the Characterization of the Dysphonic Voices
Gilles Pouchoulin, LIA - University of Avignon
Corinne Fredouille, LIA - University of Avignon
Jean-François Bonastre, LIA - University of Avignon
Alain Ghio, LPL, CNRS, Aix-Marseille University
Antoine Giovanni, LAPEC, Aix-Marseille University
Concerned with pathological voice assessment, this paper aims at characterizing dysphonia in the frequency domain for a better understanding of relating phenomena while most of the studies have focused only on improving classification systems for diagnosis help purposes. In this context, a GMM-based automatic classification system is applied on different frequency ranges in order to investigate which ones are relevant for dysphonia characterization. Experiment results demonstrate that the low frequencies [0-3000]Hz are more relevant for dysphonia discrimination compared with higher frequencies.
WeC.SS‑9
Acoustic correlates of laryngeal-muscle fatigue: Findings for a phonometric prevention of acquired voice pathologies
Victor J. Boucher, Université de Montréal
This presentation focuses on defining valid acoustic correlates of vocal fatigue as a condition that can lead to voice pathologies. Several findings are reported based on a corpus of recordings involving electromyography (EMG) of laryngeal muscles and voice acoustics. The recordings were obtained in sessions of vocal effort extending across 12-14 hours. A known technique for estimating muscle fatigue is applied involving “spectral compression” of EMG potentials. The results show critical changes at given times of day. In examining the effects of these changes on voice acoustics, there is no linear correlation with respect to conventional acoustic parameters, but peaks in voice tremor occur at points of critical change in muscle fatigue. Further results are presented showing the need to take into account compensatory muscle actions in defining valid phonometric signs of vocal fatigue.
WeC.SS‑10
Automatic Scoring of the Intelligibility in Patients with Cancer of the Oral Cavity
Andreas Maier, Universität Erlangen-Nürnberg, Abteilung für Phoniatrie und Pädaudiologie
Maria Schuster, Universität Erlangen-Nürnberg, Abteilung für Phoniatrie und Pädaudiologie
Anton Batliner, Universität Erlangen-Nürnberg, Lehrstuhl für Mustererkennung
Elmar Nöth, Universität Erlangen-Nürnberg, Lehrstuhl für Mustererkennung
Emeka Nkenke, Universität Erlangen-Nürnberg, Mund-Kiefer- und Gesichtschirurgiesche Klinik
After surgical treatment of cancer of the oral cavity patients often suffer from problems with speaking. In this paper we present a novel approach to assess the outcome of the treatment w.r.t. the intelligibility of the patient using the result of an automatic speech recognition system. The word recognition rate was taken as intelligibility score. Compared to four speech experts this method yields results that are as good as the best speech expert compared to the other experts. The correlation between our system and the mean opinion of the experts is .92. Furthermore we show that our system has better performance than the average expert and is more reliable.
WeC.SS‑11
Automatic Assessment of Children's Reading Level
Jacques Duchateau, Katholieke Universiteit Leuven, Belgium
Leen Cleuren, Katholieke Universiteit Leuven, Belgium
Hugo Van hamme, Katholieke Universiteit Leuven, Belgium
Pol Ghesquière, Katholieke Universiteit Leuven, Belgium
In this paper, an automatic system for the assessment of reading in children is described and evaluated. The assessment is based on a reading test with 40 words, presented one by one to the child by means of a computerized reading tutor. The score that expresses the child's reading performance is calculated as the total time needed to read the 40 words divided by the number of correctly read words. In each grade, children are classified in 5 groups based on their score as provided by human annotators. We show that when the score for a child is assessed automatically using a speech recognizer, a classification can be obtained with a substantial agreement (Cohen's Kappa over 0.6) with the human classification. As all children in the experiments were classified either correctly or in an adjoining group, we can conclude that the proposed system can provide large time gains in current manual classification procedures.
WeC.SS‑12
Using Waveform Matching Techniques in the Measurement of Shimmer in Voiced Signals
Carlos Ferrer, Central University of Las Villas
María E. Hernández-Díaz, Central University of Las Villas
Eduardo González, Central University of Las Villas
In this work several approaches of amplitude contours estimation for shimmer measurement are analyzed and compared. The approaches covered incorporate a waveform matching procedure proposed in this work, based on existent least squares measures. The experimental comparisons evaluate each method’s sensitivity to periodicity perturbations like jitter, shimmer, and noise, as well as their combination. The waveform matching technique shows an overall performance better than the other methods.
WeC.SS‑13
Analysis of the Impact of Analogue Telephone Channel on MFCC Parameters for Voice Pathology Detection
Rubén Fraile, Universidad Politécnica de Madrid
Juan Ignacio Godino-Llorente, Universidad Politécnica de Madrid
Nicolás Sáenz-Lechón, Universidad Politécnica de Madrid
Víctor Osma-Ruíz, Universidad Politécnica de Madrid
Pedro Gómez-Vilda, Universidad Politécnica de Madrid
In this paper, the feasibility of remote diagnostic of voice pathology is analysed. More specifically, the performance of MFCC-based pathology detectors over speech transmitted through an analogue telephone channel is studied. Results indicate that MFCC are voice features fairly robust to amplitude distortion and almost insensitive to phase distortion, but the efficiency of a voice pathology detector based on these features is clearly decreased when the speech samples are transmitted trough a telephone channel.
WeC.SS‑14
Objective Parameters from Videokymographic Images: a User-Friendly Interface
Claudia Manfredi, Department of Electronics and Telecommunications, Univ. of Firenze, Firenze, Italy
Leonardo Bocchi, Department of Electronics and Telecommunications, Univ. of Firenze, Firenze, Italy
Giovanna Cantarella, Otolaryngology Department, Univ. of Milano, Ospedale Maggiore Policlinico Mangiagalli e Regina Elena, Fondazione IRCCS, Milano, Italy
Giorgio Peretti, Otolaryngology Clinic, Univ. of Brescia, Spedali Civili di Brescia, Brescia, Italy
Gabriele Guidi, Department of Electronics and Telecommunications, Univ. of Firenze, Firenze, Italy
Vincenzo Mezzatesta, Department of Electronics and Telecommunications, Univ. of Firenze, Firenze, Italy
Videolaringostroboscopy (VLS) is a first choice examination for diagnosis of laryngeal pathologies, but ineffective in case of strong a-periodicity. Hence, a new high-speed technique, named videokymography (VKG) has been developed, delivering images from a single line selected from the image. Despite its usefulness, until now no quantitative analysis of VKG images is commercially available. This paper presents a new tool for tracking quantitative parameters from VKG images. It performs evaluation of left-to-right period, amplitude and phase ratios, as well as of phase symmetry index, by means of robust techniques for edge detection, to reduce both noise and artefacts. The new tool is provided with a user-friendly interface for managing patients’ data and image analysis, according to a set of parameters that can be adjusted by the user. VKG images from one non-dysphonic and seven dysphonic subjects were analysed, providing objective parameters useful in diagnosis support.