Interspeech 2007 logo
August 27-31, 2007

Antwerp, Belgium
Antwerp Cathedral tower Antwerp main square Antwerp harbour in twilight
printer friendly version of this page

Interspeech 2007 Session ThC.O2: First and second language learning

Type oral
Date Thursday, August 30, 2007
Time 13:30 – 15:30
Room Darwin
Chair Mirjam Broersma (Max Planck Institute, Nijmegen)

Tone Production of the Speakers of Different Age-and-Sex Groups
Wai-Sum Lee, Department of Chinese, Translation and Linguistics, City University of Hong Kong

This paper is an acoustic analysis of the pitch/F0 of the six long tones [55 33 22 21 25 23] in Cantonese produced by the male and female adult speakers and male and female child speakers. Results show that (i) the F0 patterns of the Cantonese tones for the speakers of different age-and-gender groups are similar, but the absolute F0 values differ. (ii) The difference in F0 between the adult and child speakers is large, but less between the child and female adult speakers. (iii) The difference in F0 is noticeable between the adult speakers of different genders, but not between the male and female child speakers. (iv) The difference in F0 across the speaker groups is uniformly scaled for different tone types. And (v) for all the six tones, the F0 value for the child speakers is approximately an octave higher than that for the male adult speakers and 1.2 to 1.3 times higher than that of the female adult speakers, and the F0 of the female adult speakers is slightly over a half octave higher than the F0 for the male adult speakers.
Vowels and Tones in Infant Directed Speech: Hyperarticulation for Both, but Different Developmental Patterns
Nan Xu, MARCS Auditory Laboratories, UWS, Australia
Denis Burnham, MARCS Auditory Laboratories, UWS, Australia
Christine Kitamura, MARCS Auditory Laboratories, UWS, Australia

Mothers hyperarticulate vowels in their Infant Directed Speech (IDS). We investigate whether similar hyperarticulation occurs for lexical tones in tone langauge IDS, and chart the development of the two hyperarticulations (if any) in IDS across the infant’s first year. A total of 22 native Cantonese speaking mothers was recorded, 11 when their infants were 3-, 6-, and 9-months-old, and another 11 when their infants were 6-, 9-, and 12-months-old. Analysis focused on nine target words; three for vowels (one for each of the corner vowels /i/, /a/ and /u/), and another six for each of the Cantonese tones on the vowel /i/. Vowel hyperarticulation was investigated using first and second formant values, and tones using fundamental frequency onset and offset [1]. Preliminary results indicate that both vowel and tone hyperarticulation occur, with vowel hyperarticulation emerging around 6 months and increasing to 9 and 12 months, while tone hyperarticulation occurs only at 6 and 9 months.
Acquisition of Vowel Duration in Children Speaking American English
Eon-Suk Ko, Department of Cognitive and Linguistic Sciences, Brown University

This study is an acoustic investigation of the acquisition of vowel duration in children speaking American English. The primary goal was to find out when and how children begin to produce different vowel durations as a function of postvocalic voicing. A total of 803 longitudinal data extracted from the Providence Corpus were analyzed. The age range covered by the data was from 0;11 to 4;0. The findings are summarized as follows: (1) Children control the vowel duration conditioned by voicing before the age of 2. (2) They also make the durational distinction between the tense and lax vowels before the age of 2. (3) There is no developmental trend in the acquisition of the vowel duration conditioned by postvocalic voicing. The results suggest that children thoroughly learn the phonetic implementation of temporal parameter from the very early stage of speech production to such an extent as to make it appear as an automatic process.
F0 models show Chinese speakers of Japanese insert intonational boundaries and drop pitch
Hiroko Hirano, University of Tokyo
Keikichi Hirose, University of Tokyo
Goh Kawai, Hokkaido University
Wentao Gu, Chinese University of Hong Kong
Nobuaki Minematsu, University of Tokyo

We used a command-response additive F0 model to analyze F0 patterns of Japanese spoken by native speakers of Mandarin Chinese. Compared to native speakers of Japanese, we found that Chinese speakers exhibit the following characteristics: (a) higher pitch, (b) more phrases, (c) bunsetsu decomposition, and (d) utterance-final plunging. These characteristics physically manifest themselves as: (a) higher baseline F0, (b) more phrase commands, (c) more accent commands, and (d) negative commands. These characteristics may be subjectively perceived as: (a) tinnier speech (possible L1 marker but does not degrade communication), (b) disjoint phrases (requires mental consolidation), (c) choppy prosodic words (requires reconstruction), and (d) abrupt utterance termination (possibly misconstrued as emphatic or rude). We believe these difficulties probably arose from tonal and syllable-timed interference, which can be overcome by prosodic control and planning.
Formal modelling of L1 and L2 perceptual learning: Computational linguistics versus machine learning
Paola Escudero, Institute of Phonetic Sciences, University of Amsterdam
Jelle Kastelein, Department of Informatics and Mathematics, University of Amsterdam
Klara Weiand, Department of Informatics and Mathematics, University of Amsterdam
Rob van Son, Institute of Phonetic Sciences, University of Amsterdam

In this paper, we evaluate the adequacy of two widely used machine learning algorithms and a computational linguistic proposal to model L2 perceptual development. The three proposals are, in order, Nearest Neighbor, Naive Bayesian and Stochastic OT and the Gradual Learning Algorithm. We compared the three models’ outputs to those of Spanish learners of Dutch who were asked to categorize synthetic stimuli as one of the 12 Dutch vowels. The empirical results of the human learners show that L2 learners differ significantly from native listeners, but also that their perceptual spaces tend to become more native-like with L2 proficiency. The results of the simulations show that all three algorithms are able to model listeners’ data to a certain extent but that Stochastic OT and the Gradual Learning Algorithm, i.e. the linguistic model, best reproduces L1 and L2 data.
Kettle Hinders Cat, Shadow Does Not Hinder Shed: Activation of ‘Almost Embedded’ Words in Nonnative Listening
Mirjam Broersma, Nijmegen Institute for Cognition and Information, Radboud University Nijmegen, Nijmegen, The Netherlands

A Cross-Modal Priming experiment investigated Dutch listeners’ perception of English words. Target words were embedded in a carrier word (e.g., ‘cat’ in ‘catalogue’) or ‘almost embedded’ in a carrier word except for a mismatch in the perceptually difficult /æ/-/ε/ contrast (e.g., ‘cat’ in ‘kettle’). Previous results showed a bias towards perception of /ε/ over /æ/. The present study shows that presentation of carrier words either containing an /æ/ or an /ε/ led to long lasting inhibition of embedded or ‘almost embedded’ words with an /æ/, but not of words with an /ε/. Thus, both ‘catalogue’ and ‘kettle’ hindered recognition of ‘cat’, whereas neither ‘schedule’ nor ‘shadow’ hindered recognition of ‘shed’.

ISCA logo Universiteit Antwerpen logo Radboud University Nijmegen logo Katholieke Universiteit Leuven logo