Interspeech 2007 Session TuC.O2: Speech production I
Type
oral
Date
Tuesday, August 28, 2007
Time
13:30 – 15:30
Room
Darwin
Chair
Hiroya Fujisaki (University of Tokyo)
TuC.O2‑1
13:30
An articulatory and acoustic study of ”retroflex” and ”bunched” American English rhotic sound based on MRI
Xinhui Zhou, Department of Electrical and Computer Engineering, University of Maryland, College Park, USA
Carol Espy-Wilson, Department of Electrical and Computer Engineering, University of Maryland, College Park, USA
Mark Tiede, Haskins Laboratories and MIT R.L.E., USA
Suzanne Boyce, Department of Communication Sciences and Disorders, University of Cincinnati, USA
The North American rhotic liquid has two maximally distinct articulatory variants, the classic ”retroflex” and the classic ”bunched” tongue postures. The evidence for acoustic differences between these two variants is reexamined using magnetic resonance images of the vocal tract in this study. Two subjects with similar vocal tract dimensions but different tongue postures for sustained /r/ are used. It is shown that these two variants have similar patterns of F1-F3 and zero frequencies. However, the ”retroflex” variant has a larger difference between F4 and F5 than the ”bunched” one (around 1400 Hz vs. around 700 Hz). This difference can be explained by the geometry differences between these two variants, in particular, the shorter and more forward palatal constriction of the ”retroflex” /r/ and the sharper transition between palatal constriction and its anterior and posterior cavities. This formant pattern difference is confirmed by measurement from acoustic data of several additional subjects.
TuC.O2‑2
13:50
An MRI study of European Portuguese nasals
Paula Martins, Escola Superior de Saúde, Universidade de Aveiro, Portugal
Inês Carbone, Departamento Electrónica Telec. Infomática/IEETA, Universidade de Aveiro, Portugal
Augusto Silva, Departamento Electrónica Telec. Infomática/IEETA, Universidade de Aveiro, Portugal
António Teixeira, Departamento Electrónica Telec. Infomática/IEETA, Universidade de Aveiro, Portugal
In this work we present a recently acquired MRI database for European Portuguese. As a first example of possible studies, we present results on 2D and 3D analyses of European Portuguese nasals, particularly nasal vowels. This database will enable the extraction of 2D and/or 3D articulatory parameters as well as some dynamic information to include in articulatory synthesizers. It can also be useful to compare the production of European Portuguese with the production of other languages and have further insight on some of the European Portuguese characteristics, as the nasalization and coarticulation. The MRI database and related studies were made possible by the interdisciplinary nature of the research team, comprising a radiologist, image processing specialists and a speech scientist.
TuC.O2‑3
14:10
A four-cube FEM model of the extrinsic and intrinsic tongue muscles to simulate the production of vowel /i/
Sayoko Takano, RWTH-Aachen University
Hiroki Matsuzaki, Hokkai-Gakuen University
Kunitoshi Motoki, Hokkai-Gakuen University
Roles of the extrinsic and intrinsic tongue muscles in the production of vowel /i/ were examined using a finite element model applied to the tagged cine-MRI data. It has been thought that tongue tissue deformation for /i/ is mainly due to the combined actions of the genioglossus muscle advancing the tongue root to elevate the dorsum with a mid-line grooving. A recent study with the tagging-MRI revealed an independent hydrostat factor of the anterior half of the tongue during /ei/ sequence: elevation of the tongue blade was caused by medial tissue compression with earlier, faster and greater tissue deformation. In this study, a simple four-cube model was build to examine co-contraction effect of the genioglossus and transverse muscles using finite element method (FEM). The simulation result with the anterior transverse muscle (Ta) showed a good agreement with the tagging-MRI data, suggesting that transverse anterior also plays an important role for the production of the vowel /i/.
TuC.O2‑4
14:30
Performance evaluation of glottal quality measures from the perspective of vocal tract filter consistency
Juan Torres, Georgia Institute of Technology
Elliot Moore, Georgia Institute of Technology
The main difficulty in glottal waveform estimation is the separation of the unknown vocal tract and glottal components of the speech signal. Several glottal quality measures (GQM's) have been proposed to objectively assess the quality of source-tract separation by exploiting known properties of glottal waveforms. In this paper, we present a performance evaluation of 10 GQM's based on the consistency of estimated vocal tract filters (VTF's) on sustained vowel utterances. We compare the results obtained using GQM's to select the optimal estimates to the case where the linear prediction window is aligned exactly with the glottal closure instant (GCI). Although GCI use resulted in the most consistent VTF's, there was a significant benefit from combining several GQM's for selecting optimal estimates. In addition, the GQM-derived estimates were shown to have higher divergence than the GCI estimates across some phoneme-pairs, suggesting higher class-separability.
TuC.O2‑5
14:50
Statistical identification of critical, dependent and redundant articulators
Veena D. Singampalli, University of Surrey
Philip J.B. Jackson, University of Surrey
A compact, data-driven statistical model for identifying roles played by articulators in production of English phones using 1D and 2D articulatory data is presented. Articulators critical in production of each phone were identified and were used to predict the pdfs of dependent articulators based on the strength of articulatory correlations. The performance of the model is evaluated on MOCHA database using proposed and exhaustive search techniques and the results of synthesised trajectories presented.
TuC.O2‑6
15:10
An empirical investigation of the nonuniqueness in the acoustic-to-articulatory mapping
Chao Qin, Dept. CSEE, OGI, OHSU
Miguel Carreira-Perpinan, Dept. CSEE, OGI, OHSU
Articulatory inversion is the problem of recovering the sequence of vocal tract shapes that produce a given acoustic speech signal. Traditionally, its difficulty has been attributed to nonuniqueness of the inverse mapping, where different vocal tract shapes can produce the same acoustics. However, evidence for the nonuniqueness has been restricted to theoretical studies, or to data from atypical speech or very specific sounds. We present a systematic large-scale study using articulatory data for normal speech from the Wisconsin XRDB. We find that nonuniqueness does exist for some sounds, but that the majority of normal speech is produced with a unique vocal tract shape.