Interspeech 2007 logo
August 27-31, 2007

Antwerp, Belgium
Antwerp Cathedral tower Antwerp main square Antwerp harbour in twilight
printer friendly version of this page

Interspeech 2007 Session WeD.P2: Speech production II

Type poster
Date Wednesday, August 29, 2007
Time 16:00 – 18:00
Room Alpaerts
Chair Catherine Best (MARCS Auditory Laboratories, University of Western Sydney, Milperra)


Vocal Tract Length during Speech Production
Sorin Dusan, Rutgers University, CAIP

It is known that formant frequencies are inversely proportional with the vocal tract length of the speaker. Although it was observed that vocal tract length of a speaker is variable during speech production, the extent of this variability has not been fully examined in the literature. This paper presents a statistical analysis of the vocal tract length of a female speaker during the production of ten sentences in French. In addition, this paper examines various correlations between vocal tract length, lips protrusion, and larynx height, on one side, and the parameters of Maeda’s articulatory model, on the other side. The paper proposes a linear regression model of the vocal tract length as a function of eight articulatory parameters and provides a discussion on the role of the lips and larynx height maneuvers in optimizing the production of speech in terms of achieving high phonetic contrast, high speed, and minimum energy.

Approximation method of subglottal system using ARMA filter
Nobuhiro Miki, Future University -Hakodate
Kyohei Hayashi, Future University -Hakodate

We propose a method of approximation using a rational polynomial of s for the subglottal impedance of the model of Fredberg and Hoenig, and of realization of an ARMA filter model of the subglottal system. We employ the data of the structure and size of blanching network of the subglottal system, and adjust the data for Japanese adults using the MRI data of the trachea. Our subglottal model can be adjusted to the circuit model of the vocal tract with the glottal impedance. Using the model with the dummy section, we show the relation between the circuit model and forward/backward waves at the glottis.

Enhancing Acoustic-to-EPG Mapping with Lip Position Information
Asterios Toutios, Dept. of Applied Informatics, Univ. of Macedonia, Thessaloniki, Greece
Konstantinos Margaritis, Dept. of Applied Informatics, Univ. of Macedonia, Thessaloniki, Greece

This paper investigates the hypothesis that cues involving the positioning of the lips may improve upon a system that performs a mapping from acoustic parameters to electropalatographic (EPG) information; that is, patterns of contact between the tongue and the hard palate. We adopt a multilayer perceptron as a relatively simple model for the acoustic-to-electropalatographic mapping and demonstrate that its performance is improved when parameters describing the positioning of the lips recorded by means of electromagnetic articulography (EMA) are added to the input of the model.

A model of glottal flow incorporating viscous-inviscid interaction
Tokihiko Kaburagi, Department of Acoustic Design, Faculty of Design, Kyushu University
Yosuke Tanabe, Graduate School of Design, Kyushu University

A model of flow passing through the glottis is presented by employing the boundary-layer assumption. Thin boundary layer near the glottal wall influences the flow behavior in terms of the flow separation, jet formation, and pressure distribution along the channel. To analyze the boundary layer, integral momentum relation was developed. On the basis of the similarity of velocity profiles, the equation can be solved for the given core flow velocity. On the other hand, boundary layer reduces the effective size of the channel and increases the velocity. Therefore, the boundary layer problem inherently entails viscous-inviscid interaction. In the paper, a method is presented to solve the boundary layer problem including such interaction. Experiments show that it is useful for predicting the flow rate, pressure distribution, and other properties when the glottal configuration and subglottal pressure are specified.

Thinking Outside the Cube: Modeling Language Processing Tasks in a Multiple Resource Paradigm
Kilian Seeber, ETI, Ecole de Traduction et d'Interprétation, Université de Genève

This paper sets out to find an alternative to Wickens’ cube in order to better visually represent the different resource pools recruited by complex language processing tasks. The model’s two principal shortcomings, i.e. its inability to visually account for the notion of general resources and the difficulty to visually represent the tasks and their structural proximity, are addressed and compensated for by redrawing the cube and eventually abandoning the three dimensional design in favor of a two dimensional model, the so-called cognitive resource footprint, which we believe to be a more intuitive reflection of the resource involved in these tasks.

Experimental Validation of Direct and Inverse Glottal Flow Models for Unsteady Flow Conditions
Julien Cisonni, Département Parole et Cognition, GIPSA-Lab, Grenoble, France
Annemie Van Hirtum, Département Parole et Cognition, GIPSA-Lab, Grenoble, France
Jan Willems, Fluid Dynamics Laboratory, TU/e, Eindhoven, The Netherlands
Xavier Pelorson, Département Parole et Cognition, GIPSA-Lab, Grenoble, France

The pressure drop along the glottal constriction drives vocal folds self-sustained oscillations during phonation. Physical modeling of phonation is classically assessed with the glottal geometry and the subglottal pressure as known input parameters. Several studies including in-vitro validation show that simplified one-dimensional flow models allow predictions of the flow characteristics to a fair extent. Application of physical modeling to study phonation abnormalities and pathologies requires input parameters which can be related to in-vivo measurable quantities commonly corresponding to the physical model output parameters. The current paper considers the inversion of some popular simplified flow models in order to estimate the subglottal pressure, the glottal constriction area or the flow model parameters under unsteady flow conditions. The theoretical predictions are tested against in-vitro measurements.

Effect of Unsteady Glottal Flow on the Speech Production Process
Hideyuki Nomura, Division of Electrical Engineering and Computer Science, Kanazawa University
Tetsuo Funada, Division of Electrical Engineering and Computer Science, Kanazawa University

The purpose of the present study is to clarify the effects of unsteady glottal flow on the phonation. We numerically simulate the speech production process within the larynx and the vocal tract based on our proposed glottal sound source model. The simulation shows amplitude and waveform fluctuations in pressure within the larynx caused by unsteady fluid motion. In order to investigate the unsteady motion effects on the phonation, the coefficient of variation (CV) of amplitude and harmonic-to-noise ratio (HNR) in terms of measures of fluctuations are estimated. The CV and the HNR indicate the greatest fluctuation near the glottis, although the CV and the HNR do not show the fluctuation faraway from the glottis.

Word stress correlates in spontaneous child-directed speech in German
Katrin Schneider, Institute of Natural Language Processing, Experimental Phonetics Group, University of Stuttgart, Germany
Bernd Möbius, Institute of Natural Language Processing, Experimental Phonetics Group, University of Stuttgart, Germany

In this paper we focus on the use of acoustic as well as voice quality parameters to mark word stress in German. Our aim was to identify the speech parameters parents use to indicate word stress differences to their children. Therefore, mothers and their children were recorded during a period of at least one year while they performed a special playing task using word pairs that differ only in the position of word stress. The recorded target words were analyzed acoustically and with respect to voice quality. The results presented here concern the mothers' productions of contrastive word stress, and we discuss our findings with respect to the results of previous studies investigating word stress. Our results provide further insight into the process of word stress acquisition in German.

Acquisition and synchronization of multimodal articulatory data
Michael Aron, INRIA Lorraine
Nicolas Ferveur, INRIA Lorraine
Erwan Kerrien, INRIA Lorraine
Marie-Odile Berger, INRIA Lorraine
Yves Laprie, CNRS

This paper describes a setup to synchronize data used to track speech articulators during speech production. Our method couples together an ultrasound, an electromagnetic and an audio system to record speech sequences. The coupling requires a precise temporal synchronization, to know exactly the delay between the recording start of each modality, and to know the sampling rate of each modality. A complete setup and methods for automatically synchronizing data are described. The aim is to get a fast, low-cost and easily reproducible acquisition system in order to temporally align data.

A phonetic concatenative approach of labial coarticulation
Vincent Robert, LORIA
Yves Laprie, LORIA - CNRS
Anne Bonneau, LORIA - CNRS

Predicting the effects of labial coarticulation is an important aspect with a view to developing an artificial talking head. This paper describes a concatenation approach that uses sigmoids to represent the evolution of labial parameters. Labial parameters considered are lip aperture, protrusion, stretching and jaw aperture. A first formal algorithm determines the relevant transitions, i.e. those corresponding to phonemes imposing constraints on one of the labial parameters. Then relevant transitions are either retrieved or interpolated from a set of reference sigmoids which have been trained on a speaker specific corpus. This labial corpus is made up of isolated vowels, CV, VCV, VCCV and 100 sentences. A final stage consists in improving the overall syntagmatic consistency of the concatenation.

Visual Analysis of Lip Coarticulation in VCV Utterances
Aseel Turkmani, Centre for Vision Speech and Signal Processing, University of Surrey
Adrian Hilton, Centre for Vision Speech and Signal Processing, University of Surrey
Philip J.B. Jackson, Centre for Vision Speech and Signal Processing, University of Surrey
James Edge, Centre for Vision Speech and Signal Processing, University of Surrey

This paper presents an investigation of the visual variation on the bilabial plosive consonant /p/ in three coarticulation contexts. The aim is to provide detailed ensemble analysis to assist coarticulation modelling in visual speech synthesis. The underlying dynamics of labeled visual speech units, represented as lip shape, from symmetric VCV utterances, is investigated. Variation in lip dynamics is quantitively and qualitatively analyzed. This analysis shows that there are statistically significant differences in both the lip shape and trajectory during coarticulation.

Comparison of Multiple Voice Source Parameters in Different Phonation Types
Matti Airas, Helsinki University of Technology
Paavo Alku, Helsinki University of Technology

A large sample of vowels produced by male and female speakers were inverse filtered and parameterized using 21 different glottal flow parameters. The performance of the different parameters in expression of the phonation type was then tested using objective statistical methods. The comparison of the results revealed marked differences in the parameters' performance, and therefore, guidelines for parameter use and comparison were established.

Acoustic and Affective Comparisons of Natural and Imaginary Infant-, Foreigner- and Adult-directed Speech
Monja Knoll, University of Portsmouth, Dept. Psychology, King Henry I Street, Portsmouth, PO1 2DY, United Kingdom
Lisa Scharrer, Psychologisches Institut, Ruprecht-Karls-Universitaet, Hauptstrasse 47-51, 69117 Heidelberg, Germany

This study evaluated the use of imagined interactions in speech research, by comparing speech addressed to imaginary speech partners with natural speech addressed to genuine interaction partners. Samples of speech directed to an imaginary infant (IDS), foreigner (FDS) and adult (ADS) produced by ten female students were acoustically analysed and also rated on positive vocal affect. Our results for vocal affect are consistent with previous findings using natural interactions, with IDS rated higher in positive vocal affect than ADS/FDS. However, acoustic analyses of IDS revealed a much smaller vowel space than ADS/FDS, with no difference between those two conditions. Unlike the findings in the natural speech samples, our IDS mean pitch was not significantly higher than ADS/FDS. Since these results are contrary to those from interactions with genuine speech partners, speech obtained from imaginary interactions should be used with caution.

Vowel Production in Two Occlusal Classes
André Araújo, Escola Superior de Tecnologia da Saúde do Porto, Instituto Politécnico do Porto, Porto, Portugal
Luis Jesus, Universidade de Aveiro, Portugal
Isabel Costa, Universidade de Aveiro, Portugal

The influence of occlusal class in speech production has been studied using the X-ray Microbeam Speech Production Database (XRMB-SPD). The objective of the study was to relate the occlusal classes I and II with vowel production adaptations. The “Modified A-Space” method was used to select 4 speakers (1 class I male, 1 class I female, 1 class II male and 1 class II female). Articulatory and acoustic features of the vowels [i, {, A, u] were studied using different tasks and methods. Results show some structural differences related with occlusal class and variance in class II subjects’ structures and articulatory adaptations. The major differences found in the vowels’ formants were between male and female groups. Occlusal class also seems to influence acoustical features of vowels produced by female speakers. Structural differences were found, but subjects showed a high adaptation capacity, being able to adjust their articulators to produce all vowels.

Nepalese retroflex stops: a static palatography study of inter and intra-speaker variability
Rajesh Khatiwada, Laboratoire de Phonétique et de Phonologie (UMR 7018), Université Sorbonne Nouvelle

Retroflex sounds are classically defined as produced with the tongue tip curled backward and often in contact behind the alveolar ridge. The sounds, however, present a great inter-language, inter- and intra-speaker articulatory variation. Retroflex stops in Nepali are defined as being produced with the tongue tip with no backward curling movement at the alveolar ridge. For Pokharel, this is not a real type of retroflexion, but rather is apico-alveolar with no backward curling of the tongue tip. The aim of this study is to verify experimentally Pokharel’s statement and the originality is to go beyond this claim. We wish to verify whether there is any coarticulation effect while producing the retroflex in different vocalic contexts. We use the classical palatography method to determine the place of articulation.The majority of the retroflex stops realized in our study are sub-apico-post-alveolar in the case of back vowels and apico-alveolar in the case of the front vowels.

Effects of testosterone levels on temporal and intonational aspects of speech: More exploratory data
Charles A. Lamoureux, Université de Montréal
Victor J. Boucher, Université de Montréal

There is a growing body of work on the effects of hormonal factors on speech and language behavior. The present research explores the links between speakers’ testosterone levels and suprasegmental aspects of speech, namely speaking rate and pitch measures for intonational phrases. Saliva samples were collected from 40 men aged between 20 and 27 years in order to assess testosterone levels. Subjects were recorded reading a standard text. Acoustic analyses of the readings revealed significant correlations where speakers with low testosterone levels tended to use higher and more variable pitch than speakers with high testosterone levels at phrase boundaries. Furthermore, the results also showed significant relationships between salivary testosterone and speaking rate during the readings. These findings reinforce the assumption that some within-sex differences in speech and voice may be based on hormonal factors.

ISCA logo Universiteit Antwerpen logo Radboud University Nijmegen logo Katholieke Universiteit Leuven logo