August 27-31, 2007

Antwerp, Belgium
Interspeech 2007 Session FrB.P3a: Prosody: production

Type poster
Date Friday, August 31, 2007
Time 10:00 – 12:00
Room Keurvels
Chair Julia Hirschberg (Department of Computer Science, Columbia University)


The Influence of Vowel Quality Features on Peak Alignment
Matthias Jilka, University of Stuttgart
Bernd Möbius, University of Stuttgart

This study continues an approach that uses a unit selection corpus in order to investigate aspects of the phonetic realization of tonal categories. The focus lies on the peak position in German H*L pitch accents, specifically on the question of whether it is influenced by vowel quality. It is confirmed that neither vowel backness nor the distinction between tense and lax vowels affect peak alignment. The feature of vowel height, however, is revealed to be a significant factor (peaks are aligned latest in high vowels, earliest in low vowels). Various parameters (e.g., syllable structure, position in the phrase) are examined for interactions, but cannot account for the effect. While vowel height correlates with vowel duration, vowel duration itself does not influence peak position. The only possible explanation found involves peak height, which is intrinsically higher in high vowels, thus it may require more time to reach the peak.

Pitch Accent versus Lexical Stress: Quantifying Acoustic Measures Related to the Voice Source
Yen-Liang Shue, Department of Electrical Engineering, University of California, Los Angeles
Markus Iseli, Department of Electrical Engineering, University of California, Los Angeles
Nanette Veilleux, Department of Computer Science, Simmons College
Abeer Alwan, Department of Electrical Engineering, University of California, Los Angeles

In this paper, we explore acoustic correlates of pitch accent and main lexical stress in American English, and the interaction of these cues with other factors that affect prosody. In a controlled study, we varied presence or absence and type of pitch accent (L* vs H*), boundary-related tone sequence (L-L% vs. H-H%) and gender of the talker, for the sentence "Dagada gave Bobby doodads". The measures were duration, F0 (fundamental frequency), H1*-H2* (related to open quotient), and H1*-A3* (related to spectral tilt). Contour approximations were used to analyze time-course movements of these measures. For "Dagada" we found that, consistent with earlier literature, a) H* and L* pitch accents showed different F0 contours, b) pitch-accented syllables were longer than unaccented ones, c) stressed "ga" syllables had lower H1*-H2* values than surrounding unstressed syllables, and for male talkers, lower H1*-A3* values, indicating lesser spectral tilt. Unexpectedly, F0 maxima associated with an H* accent occurred most of the time later in the accented syllable than F0 minima associated with L*. The ues to lexical stress were consistent with or without pitch accent (e.g. lower H1*-H2*), but they sometimes interacted with gender and/or boundary tones: for example, lower H1*-A3* in stressed "ga" syllables was only found for female talkers in unaccented cases, and some cues of both accent and stress were less pronounced in the final word "doodads", which also carried boundary-related tones.

Prosody, emotions, and... 'whatever'
Stefan Benus, Brown University
Agustin Gravano, Columbia University
Julia Hirschberg, Columbia University

We examine the role of prosody in cueing a scale of negative meanings associated with the use of whatever. The analysis of a corpus of elicited examples shows that the more negative the token, the more likely it is to have an additional pitch accent, extended duration, and expanded pitch range on the first syllable. These findings are analyzed as a link between pragmatic meaning and the strength of the prosodic boundary between the first two syllables (what#ever). The results of perception experiments show that the prosody of whatever itself is a systematic cue for the degree of negative connotation associated with the utterance in which whatever occurs. Potential applications of this result for spoken dialogue systems and synthesis of emotional speech are discussed.

Modeling Tones in Hakka on the Basis of the Command-Response Model
Wentao Gu, The Chinese University of Hong Kong
Rerrario Shui-Ching Ho, Waseda University; Universität Basel
Tan Lee, The Chinese University of Hong Kong

As one of the major Chinese dialects, Hakka typically has a tone system with six lexical tones. The traditional 5-level notation of tones in Hakka varies in previous references due to its subjective and relative nature. In order to overcome the limitations of the traditional approach, the command-response model for the process of F0 contour generation is employed to analyze quantitatively the tones in continuous speech of two varieties of Hakka, spoken in Meixian and in Shataukok, respectively. By providing both phonological descriptions to each tone type and quantitative approximations to continuous F0 contours, the model-based approach provides an efficient connection between phonetics and phonology of Hakka tones.

Length, Ordering Preference and Intonational Phrasing: Evidence from Pauses
Gerrit Kentner, Institut für Linguistik, Universität Potsdam

This paper reports a speech production experiment in which the effects of surrounding phrase lengths and head-argument distance on intra-sentential pause duration were tested. While the results confirm an effect of phrase length on pausing, this effect is found to be distinctly stronger for long phrases preceding the pause than for long upcoming phrases. The results are discussed with respect to intonational phrasing tendencies and ordering preferences for unequal-sized constituents.

Alignment of the Second Low Target in Dutch Falling-Rising Pitch Contours
Jörg Peters, Radboud University Nijmegen
Judith Hanssen, Radboud University Nijmegen
Carlos Gussenhoven, Radboud University Nijmegen

Two production experiments were conducted to establish the anchor point for the beginning of the final rise in Dutch falling-rising pitch contours. We systematically varied the prosodic structure of the post-nuclear words by including the stress level (primary or secondary) of the penultimate syllable and the distance of the last stressed syllable to the utterance end as factors. None of the syllable types provided an anchor point for the timing of the beginning of the rise, which appeared to be most constant relative to the utterance end or to the end of the rise. Our finding is not consistent with earlier experiments which found a tendency for the beginning of the rise to be attracted to the last stressed syllable. Additionally, we found that unaccented primary stressed syllables are somewhat longer than unaccented secondary stressed syllables confirming earlier findings for Dutch obtained on the basis of reiterant speech.

On Filled Pauses and Prolongations in European Portuguese
Helena Moniz, INESC-ID/CLUL
Ana Isabel Mata, FLUL/CLUL, University of Lisbon
Céu Viana, CLUL/FLUL, University of Lisbon

This paper reports preliminary results from a study of disfluencies in European Portuguese, based on a corpus of prepared (non-scripted) and spontaneous oral presentations in high school context. We will focus on the contextual distribution and temporal patterns of filled pauses and segmental prolongations, as well as on the way those are rated by listeners. Results suggest that filled pauses and segmental prolongations behave alike, have similar functions and may be considered in complementary distribution, obeying general syntactic and prosodic constraints.

