Interspeech 2007 Session WeC.P1: Phonetics and phonology

Type poster
Date Wednesday, August 29, 2007
Time 13:30 – 15:30
Room Foyer
Chair Grazyna Demenko (Institute of Linguistics, Adam Mickiewicz University, Poznan)


The phonetics and phonology of high and low tones in two falling f0-contours in Standard German
Tamara Rathcke, Institute of Phonetics and Speech processing, University of Munich, Germany
Jonathan Harrington, Institute of Phonetics and Speech processing, University of Munich, Germany

The present paper reports the results of an imitation experiment developed to evaluate empirically the validity of the AM-analyses given for two falling f0-patterns in German by different researchers. We look at the phonetic realisations of temporal alignment and frequency scaling of high and low tonal targets in varying syllabic environments. The effects of two phonetic factors were tested: (1) syllable structure of the postnuclear part of a phrase and (2) syllabic structure of the nuclear syllable. The results show that scaling and alignment are affected by the investigated factors in an unexpected way, so that predictions from different AM-analyses could not be confirmed by the data. We discussed the implications of the results in the light of the proposed analyses.

Temporal alignment of creaky voice in neutralised realisations of an underlying, post-nasal voicing contrast in German
Tina John, Institute of Phonetics and Speech Processing, Ludwig-Maximilians-University, Munich, Germany
Jonathan Harrington, Institute of Phonetics and Speech Processing, Ludwig-Maximilians-University, Munich, Germany

The aim of the present experiment was to investigate the acoustic phonetic cues that could underlie a post-stress voicing distinction which, when considered on a segmental basis, appears to be neutralised. The data concern the difference in German between minimal pairs such as 'Enten' and 'enden' which in more casual speaking styles appear to show schwa and oral stop deletion and a surface realisation as a neutralised creaky voice nasal. We extracted from the Kiel Corpus all such contrasts that were judged by trained transcribers to have been neutralised in this way. We measured the spectral slope over the first two harmonics and the time at which the spectral slope first changed significantly. Our results show that, contrary to segmentally-based assumptions, /t, d/ were distinguished depending on the onset time of creaky voice relative to the preceding vowel. These data are consistent with a model in which cues to segmental contrasts may be distributed non-segmentally in time.

The Duration of Speech Pauses in a Multilingual Environment
Mike Demol, Vrije Universiteit Brussel, dept. ETRO-DSSP, Pleinlaan 2, 1050 Brussels, Belgium
Werner Verhelst, Vrije Universiteit Brussel, dept. ETRO-DSSP, Pleinlaan 2, 1050 Brussels, Belgium
Piet Verhoeve, Corporate R&D dept., TELEVIC nv, Leo Bekaertlaan 1, B-8870 Izegem, Belgium

In this paper we present a study of speech pauses at three different speaking rates, based on the analysis of four hours of read speech in six European languages. Our results confirm earlier observations that the logarithmic duration of the pauses can be well approximated by a bi-Gaussian distribution. We found this also to be true at slow and fast speaking rates. Our analysis further shows that, as far as the long speech pauses are concerned, similar strategies are used in all languages considered. For speaking slowly, speakers increase the total amount of pauses and they use a wider range of pause durations. Overall, there appeared to be no striking change in the average, nor in the variance of the distribution of the pause durations. For speaking rapidly, speakers decrease the amount of pauses used and they refrain from using the longest pauses that occur in their normal speech. Overall, this results in a lower average and a smaller variance of the pause durations.

Syllable timing patterns in Polish: results from annotation mining
Dafydd Gibbon, Universität Bielefeld
Jolanta Bachan, Institute of Linguistics, Adam Mickiewicz University, Poznan
Grazyna Demenko, Institute of Linguistics, Adam Mickiewicz University, Poznan

Previous studies of duration variation in syllable constituents have yielded results for Polish which are clear outliers in relation to those for other languages. We report on a study of this issue in the context of TTS development, using a large annotated database. Global and local duration distance measures are applied to phoneme and syllable level units, and generalised iambic and trochaic duration patterns are compared with grammatical structure. The study suggests that Polish is more syllable-timed than previously thought, and reveals close relationship between syllable duration patterns and word sequences.

Minimal Pairs and Functional Loads of Sound Contrasts Obtained from a List of Modern Greek Words
Constandinos Kalimeris, Institute for Language and Speech Processing (ILSP)
Stelios Bakamidis, Institute for Language and Speech Processing (ILSP)

This paper reports on the initial results of our investigation into the distribution of speech sounds across the lexicon of Modern Greek (MG). The data we discuss ultimately derive from the list of orthographic word-types of a large general corpus of written MG. The orthographic word-types were automatically transcribed into their respective citation forms. Minimal pairs were automatically extracted from the resultant list of citation forms. The Functional Load (FL) of each sound opposition was computed as a function of (a) the length of citation forms, (b) the position of each sound contrast within citation forms and (c) the number of minimal pairs pertinent to each opposition in question. The body of data yielded by this study will be used for further research in MG phonology as well as for the improvement of the performance of Automatic Speech Recognition applications.

More on acoustic correlates of stress
Daan Wissing, North-West University, Potchefstroom, South Africa

The descriptive and explanatory power of various relatively unknown parameters of stress [1, 2] was investigated. They were either derived from the latter, or from the physiological process of phonation. Two stimulus Afrikaans words in and out of focal accented sentence position were read by three Afrikaans female participants. The stressed and unstressed vowel /A/ was investigated in the two contexts. Effect sizes and multiple regression analysis results were used in determining the power of the parameters as to the acoustic correlates of stress. The results substantiate the ground breaking work of [1, 2] in this regard, but in some instances they are in contradiction of these. Many of the parameters proved to be quite powerful constructs, in some cases surpassing the known ones in strength. The successful derivation of such parameters from the physiological basis of the phonation process is demonstrated. Special attention is paid as to the description of vowel reduction.

Comparing Praat and Snack formant measurements on two large corpora of northern and southern French
Cécile Woehrling, LIMSI-CNRS
Philippe Boula de Mareüil, LIMSI-CNRS

We compare formant frequency measurements between two authoritative tools (Praat and Snack), two large corpora (of face-to-face and telephone speech) and two French varieties (northern and southern). There are both an evaluation of formant tracking (as well as related filtering techniques) and an application to find out salient pronunciation traits. Despite differences between Praat and Snack with regard to telephone speech (Praat yielding greater F1 values), results seem to converge to suggest that northern and southern French varieties mainly differ in the second formant of the open /O/. /O/ fronting in northern French (with F2 values greater than 1100 Hz for males and 1200 Hz for females) is by far the most discriminating feature provided by decision trees applied to oral vowel formants.

The Phonetic Exponency of Phrasal Accentuation in French and German
William J. Barry, Institute of Phonetics, Saarland University, Germany
Bistra Andreeva, Institute of Phonetics, Saarland University, Germany
Ingmar Steiner, Institute of Phonetics, Saarland University, Germany

The acoustic-phonetic properties of words spoken with three different levels of accentuation (de-accented, pre-nuclear and nuclear accented in broad-focus and nuclear accented in narrow-focus) are examined in question-answer elicited sent¬ences and iterative imitations (on the syllable da) produced by six French and six German speakers. Normalised parameter values allow a comparative weighting of the properties employed in differentiating the three levels of accentuation. Clear differences are found between French and German in the weighting hierarchy of the acoustic properties.

Phonetic Geminates in Cypriot Greek: the Case of Voiceless Plosives
Christiana Christodoulou, Department of Linguistics, University of British Columbia

The research presented in this paper provides evidence toward the existence of geminates in Cypriot Greek (hereinafter, CyG). Toward this end, statistical analysis supports significant durational differences in closure duration, the cross-linguistic correlate to gemination. However, since the language maintains an audible distinction between the two phonemic categories and since closure duration cannot be measured in utterance initial environments, another/alternative correlate is necessary.Contrary to previous studies (Arvaniti & Tserdanelis, 2000: 562; Arvaniti, 1999b: 602), this paper argues, supported by highly significant durational differences, that it’s more likely that VOT is the primary correlate to gemination, for CyG geminate plosives, because this correlate is available utterance initially. Preliminary statistical analysis suggests an effect of the vowel following the target segment, a fact that could facilitate in resolving the phonological representation of utterance initial geminates.

Predicting Vowel Duration in Spontaneous Canadian French Speech
Darcie Williams, The University of Western Ontario
François Poiré, The University of Western Ontario

This study examines variables influencing vowel duration of French spoken in Windsor, Ontario, in order to see whether their respective effects on vowel duration are organised hierarchically. We first consider the data distribution of four female speakers before carrying out a statistical principal components analysis. Our results show that the variables are classified into three underlying factors: syllable structure, syllable position and vowel properties. This last factor group includes the factors phonological vowel class and diphthong status, and always explains the majority of the variability in vowel duration. Syllable position also accounts for some of this variation in certain cases. The consistent hierarchy of these factors across the statistical analyses confirms that a vowel’s properties are the most important in determining its duration, followed first by the syllable’s position in the utterance, and second by the syllabic structure.

Rhotic variation and schwa epenthesis in Windsor French
Ivan Chow, The University of Western Ontario
François Poiré, The University of Western Ontario

This study investigates two idiosyncratic phenomena found in the French dialect of the Windsor area in SW Ontario. (1) Rhotics in this dialect are pronounced in three phonetic varieties: the dorsal fricative, the alveolar approximant, and the alveolar tap. (2) An epenthetic schwa is also found in the phonetic realization in certain phonemic contexts. Through phonetic analysis of recorded speech and statistical data analyses, we explore whether the phonological context is a good predictor of the phonetic realization of rhotic and schwa epenthesis. Amongst the independent factors, the manner and place of articulation of the phonemes preceding and following the rhotic are best predictors for the type of rhotic, the presence/absence of schwa epenthesis, as well as for the duration of the epenthetic schwa. The type of rhotic realization is a pre-condition for schwa epenthesis. Finally, duration of phonological schwa is significantly longer than epenthetic schwa in schwa-rhotic sequences.

On the Categorical Nature of the Process Involved in Schwa Elision in French
Audrey Burki, Laboratoire de Psycholinguistique Expérimentale, Université de Genève, Suisse
Cécile Fougeron, Laboratoire de Phonétique et Phonologie, UMR7018, CNRS-Paris3/Sorbonne Nouvelle, France
Cédric Gendrot, Laboratoire de Phonétique et Phonologie, UMR7018, CNRS-Paris3/Sorbonne nouvelle, France

This paper examines the nature of the process involved in optional schwa elision in French. More specifically, it aims at testing whether this process is gradual or categorical, on the basis of an analysis of the distribution of the duration of over 4000 schwas extracted from a large corpus of continuous speech. The distribution observed is bimodal, with absent schwas (at 0 ms duration) on one side, and realized schwas on the other side, the two groups being separated by a small gap in the distribution. Realized schwas present cases of strong temporal reduction, but this reduction does not show a continuous pattern toward zero duration, as would be predicted if schwa elision was the end-point of a gradual reduction process. Different interpretations of this distribution are discussed.

Exploring Tonal Variations via Context-Dependent Tone Models
Yue-Ning Hu, School of Computer Science and Engineering, Northwestern Polytechnical University, Xi’an, 710072, China
Min Chu, Microsoft Research Asia, Beijing, 100080, China
Chao Huang, Microsoft Research Asia, Beijing, 100080, China
Yan-Ning Zhang, School of Computer Science and Engineering, Northwestern Polytechnical University, Xi’an, 710072, China

In this paper, we study tonal variations by training context-dependent tone models from a large speech corpus. Each model represents a tone-in-context and can output a stylized f0 pattern for it. With these tone models, it becomes tangible to investigate f0-variations with plenty of factors. Six contextual factors are investigated in this work. We find that impact of a factor varies across tones as well as the three states of a tone. Normally, onset pitch of a tone is determined jointly by syllable position and left tone, while, the offset pitch is mainly determined by syllable position. For a neutral tone, its pitch level is mainly depended on the left tone and syllable position affects its offset pitch. Both current vowel and right tone influence the pitch level, yet their impacts are weaker than syllable position and left tone, except for T3, the low tone.

Acoustic Analysis of the Neutral Tone in Mandarin
Philippe Martin, Université Paris Diderot
Jun Li, Université Paris Diderot

East Asian Languages such as Mandarin do have lexical tones in their phonological system. Pronounced in isolation, the fundamental frequency contours produced by these tones are relatively stable and their shapes well described phonetically. However, modifications can occur, not only in the well known case where two consecutive third tone are realized with a tone two - tone three sequence, but in other contexts as well, producing either one of the three other tones available in the phonological system or a so called neutral tone. In this paper, specific acoustic characteristics of neutral tones resulting from sequences of three or more T3 tones are investigated. In particular, values of the fundamental frequency Fo glissando were evaluated and compared to a perception threshold. Other melodic features were considered as well.

F0 analysis of perceptual distance among Cantonese level tones
Rerrario Shui-Ching Ho, Global Information and Telecommunication Institute, Waseda University, Tokyo, Japan & Englisches Seminar, Universität Basel, Basel, Switzerland
Yoshinori Sagisaka, Global Information and Telecommunication Institute, Waseda University, Tokyo, Japan

This paper presents an acoustical analysis of the pitch height of the four level tones of Cantonese in search for a quantitative relationship of their perceptual distance. Our preliminary measurements and calculations give the first evidence that the conventional representations were mostly mistaken.

