Interspeech 2007 logo
August 27-31, 2007

Antwerp, Belgium
Antwerp Cathedral tower Antwerp main square Antwerp harbour in twilight
printer friendly version of this page

Interspeech 2007 Session TuD.P2a: Prosody: prosodic structure

Type poster
Date Tuesday, August 28, 2007
Time 16:00 – 18:00
Room Alpaerts
Chair Ani Nenkova (Linguistics Department, Stanford University)


Pitch Pattern Alternation in Goshogawara Japanese: Evidence for a Prosodic Phrase above the domain for Downstep
Yosuke Igarashi, Japan Society for the Promotion of Science, National Institute for Japanese Language

The lexically accented words in Goshogawara Japanese can be realized in either of the two surface pitch patterns. The pitch pattern is said to alternate regularly, depending on the phrasing structure of an utterance. The organization of prosodic phrasing of this dialect, however, has been little investigated and thus it remains unclear what prosodic phrase functions as the domain for the alternation. This work determines the level of the phrase serving as the domain for the pitch pattern alternation. Is it hierarchically higher or lower than the domain for downstep? The experimental results reveal that the alternation does not take place at the prosodic boundary where downstep effect is blocked. The results provide evidence for the phrasing one-level above the downstep domain, whose existence was not evident in the model proposed for Tokyo Japanese.

Some evidence on the phonetics and phonology of the prosodic phrasing in Russian
Irina Nesterenko, Université de Provence - CNRS, Laboratoire Parole et Language
Pavel A. Skrelin, Department of Phonetics, Saint-Petersburg State University

This paper treats the issue of prosodic segmentation into phrasing domains in Russian and is framed in the prosodic phonology paradigm. Distributions of prosodic boundaries are obtained in a perception experiment and the results are further explored to advance the hypotheses about the levels of prosodic constituency in Russian. The temporal organisation of the perceived domains and the eurhythmic constraints on phrasing are investigated. Particularly, the empirical data suggest that beyond the level of intonational units, there are two other levels, a level of metrical domain and one of phonological phrase, relevant in the perception of phrasing patterns.

Temporal Downtrends in Czech Read Speech
Jan Volín, Institute of Phonetics, Charles University in Prague
Radek Skarnitzl, Institute of Phonetics, Charles University in Prague

A possible existence of a regular temporal trend superimposed over the durational pattern of individual segments is explored in read continuous speech in the western Slavonic language of Czech. A short text read by 75 speakers was used to ascertain whether the contextually conditioned temporal variation would allow any phrasal tendencies to manifest. The data were normalized against the speakers’ characteristics and against the intrinsic duration of individual phones. The results indicate that while the linear trendlines are regularly declining, the most reliable partial trend is phrase-final deceleration. Three more general non-linear trends are identified.

Empirical evidence for prosodic phrasing: pauses as linguistic annotation in Korean read speech
Hyongsil Cho, LPL CNRS Aix-Marseille Université
Daniel Hirst, LPL CNRS Aix-Marseille Université

This paper looks at the relationship between acoustic cues and judgements of prosodic boundaries. It is argued that in read speech, the presence of a silent pause can generally be taken as an indication of the presence of a prosodic boundary although in spontaneous speech the presence of a silent pause is neither a necessary nor a sufficient condition for a prosodic boundary. Two experiments are described. In the first, Korean subjects were asked to say whether extracts of speech (filtered to make them unintelligible) were taken from the same or from two different sentences. The results confirmed that in the majority of cases the listeners do not rely on the presence of a silent pause since even when the pause has been removed the boundary is correctly predicted almost as well as when it has not been removed. In the second experiment a number of acoustic cues, temporal and frequential, were used to predict the distribution of pauses without reference to the silence itself.

Exploiting Prosody for PCFGs with Latent Annotations
Markus Dreyer, CLSP
Izhak Shafran, OGI

We propose novel methods for integrating prosody in syntax using generative models. By adopting a grammar whose constituents have latent annotations, the influence of prosody on syntax can be learned from data. In one method, prosody is utilized to seed the latent annotations of a grammar which is then refined using EM iterations. In an orthogonal approach, we integrate prosody into grammar more explicitly using a model that jointly observes words and associated prosody. We evaluate the two methods by parsing speech data from the Switchboard corpus. The results are compared against baseline results from a model that does not use prosody. The experiments show that prosody improves a grammar in terms of accuracy as well as the parsimonious use of parameters.

Combining Length Distribution Model with Decision Tree in Prosodic Phrase Prediction
Qin Shi, IBM China Research Lab, Beijing, China
DanNing Jiang, IBM China Research Lab, Beijing, China
FanPing Meng, IBM China Research Lab, Beijing, China
Yong Qin, IBM China Research Lab, Beijing, China

In Text-to-Speech (TTS) systems, prosody phrase prediction is important for the naturalness and intelligibility of synthesized voice. Statistic methods, such as dynamic programming (DP), decision tree (DT), maximum entropy (ME), etc, have been considered for the task. Features based on syntactic and lexical information are widely used. However, the predicted prosody phrases are often observed to have unrealistic length due to the lack of length distribution modeling. This paper proposes a novel algorithm to incorporate the length distribution model in prosody phrase prediction. Rather than directly use phrase length as a feature of DT or ME, the algorithm exploits the correlation between the length and the possibility given by a decision tree. Experiments show that the recalling rate and precise rate are improved 16.37% and 14.05% relatively by using the proposed algorithm.

Duration and Pauses as Boundary-markers In Speech: A Cross-linguistic Study
Li-chiung Yang, Tunghai University, Taiwan

Duration variation is a key element that provides perceptual clues to phrasal organization, focus, and interactive communication of the status of idea units in speech. In this study, we provide a cross-linguistic comparison of pauses and durational patterns in spontaneous speech of English and Mandarin Chinese. Our results show that pause type, incidence, and final lengthening function similarly to indicate phrase boundary-status, and that specific contextual influences and speech venues are reflected in similar realizations in both English and Mandarin. Our findings provide evidence that durational elements are critical to discourse organization and have an underlying basis arising from universal features of human language.

ISCA logo Universiteit Antwerpen logo Radboud University Nijmegen logo Katholieke Universiteit Leuven logo