Interspeech 2007 Session ThB.O2: Speech perception II
Thursday, August 30, 2007
10:00 – 12:00
Tim Bunnell (Nemours Biomedical Research)
Time-Compressed Speech Perception with Speech and Noise Maskers
Douglas S. Brungart, Air Force Research Laboratory
Nandini Iyer, Air Force Research Laboratory
Many researchers have shown that speech signals can be time compressed (TC) by a factor of two or more without a significant loss in intelligibility. However, most previous studies with TC speech have been conducted either in quiet or, in a very small number of cases, with noise maskers. In this experiment, we examine the effect that TC has on the perception of a speech signal in the presence of a speech or noise masker. The results show that normal speech can be accelerated a modest amount without increased susceptibility to masking, but that higher TC ratios can lead to dramatically worse performance in the presence of an interfering sound. The results also indicate that time-expansion can, in some cases, lead to improved performance when a listener is attending to the quieter of two talkers in an auditory mixture. These results suggest that there are some important practical limitations on how TC should be used to enhance communications efficiency in auditory speech displays.
L2 Consonant Identification in Noise: Cross-language Comparisons
Anne Cutler, Max Planck Institute for Psycholinguistics, Nijmegen, The Netherlands
Martin Cooke, Department of Computer Science, University of Sheffield, United Kingdom
Maria Luisa Garcia Lecumberri, Department of English Philology, University of the Basque Country, Spain
Dennis Pasveer, Max Planck Institute for Psycholinguistics, Nijmegen, The Netherlands
The difficulty of listening to speech in noise is exacerbated when the speech is in the listener’s L2 rather than L1. In this study, Spanish and Dutch users of English as an L2 identified American English consonants in a constant intervocalic context. Their performance was compared with that of L1 (British English) listeners, under quiet conditions and when the speech was masked by speech from another talker or by noise. Masking affected performance more for the Spanish listeners than for the L1 listeners, but not for the Dutch listeners, whose performance was worse than the L1 case to about the same degree in all conditions. There were, however, large differences in the pattern of results across individual consonants, which were consistent with differences in how consonants are identified in the respective L1s.
Effects of Non-native Dialects on Spoken Word Recognition
Jennifer Le, MARCS Auditory Laboratories, University of Western Sydney, Milperra, Australia
Catherine Best, MARCS Auditory Laboratories, University of Western Sydney, Milperra, Australia; Haskins Laboratories, New Haven, CT, U.S.A.
Michael Tyler, MARCS Auditory Laboratories, University of Western Sydney, Milperra, Australia
Christian Kroos, MARCS Auditory Laboratories, University of Western Sydney, Milperra, Australia
The present study examined the premise that lexical information (top-down factors) interacts with phonetic detail (bottom-up, episodic traces) by assessing the impact of dialect variation and word frequency on spoken word recognition. Words were either spoken in the listeners’ native dialect (Australian English: AU), or in one of two non-native English dialects differing in phonetic similarity to Australian: South African (SA: more similar) and Jamaican Mesolect (JA: less similar). It was predicted that low-frequency English words spoken in non-native dialects, especially the less similar dialect, would require more information to be recognised due to systematic phonological and/or phonetic differences from native-dialect versions. A gating task revealed that more gates were required for JA than SA dialect words, with this effect even more pronounced for low than high-frequency words. This suggests that recognition of words is contingent upon both detailed phonetic properties within the mental lexicon, as evident in the effects of goodness of fit between native and non-native dialect pronunciations, and on lexical information.
Identification of natural whistled vowels by non whistlers
Julien Meyer, Laboratoire DDL, CNRS UMR 5596, France; LAB, Universitat Politecnica de Catalunya, SPain
Fanny Meunier, Laboratoire DDL, CNRS UMR 5596
Laure Dentel, Centro Politecnico Superior, Universidad de Zaragoza
Whistled speech consists of a phonetic emulation of the sounds produced in spoken voice. This style of speech is the result of the adaptation of the human productive and perceptive intelligence to a language behavior. In the typology of whistled forms of languages, Spanish is among the languages for which the whistled strategy emulates primarily segmental acoustic cues of vowels and consonants. The present study tests the perception of four Spanish whistled vowels by French non-whistlers. The results show that French non-whistlers were able to categorize these vowels without any learning, although not as accurately as native whistlers.
Prelexical Adjustments to Speaker Idiosyncrasies: Are they Position-specific?
Alexandra Jesse, Max Planck Institute for Psycholinguistics
James M. McQueen, Max Planck Institute for Psycholinguistics
Listeners use lexical knowledge to adjust their prelexical representations of speech sounds in response to the idiosyncratic pronunciations of particular speakers. We used an exposure-test paradigm to investigate whether this type of perceptual learning transfers across syllabic positions. No significant learning effect was found in Experiment 1, where exposure sounds were onsets and test sounds were codas. Experiments 2-4 showed that there was no learning even when both exposure and test sounds were onsets. But a trend was found when exposure sounds were codas and test sounds were onsets (Experiment 5). This trend was smaller than the robust effect previously found for the coda-to-coda case. These findings suggest that knowledge about idiosyncratic pronunciations may be position specific: Knowledge about how a speaker produces sounds in one position, if it can be acquired at all, influences perception of sounds in that position more strongly than of sounds in another position.
Top-down effects on compensation for coarticulation are not replicable
Holger Mitterer, Max Planck Institute for Psycholinguistics
Listeners use lexical knowledge to judge what speech sounds they heard. I investigated whether such lexical influences are truly top-down or just reflect a merging of perceptual and lexical constraints. This is achieved by testing whether the lexically determined identity of a phone exerts the appropriate context effects on surrounding phones. The current investigations focuses on compensation for coarticulation in vowel-fricative sequences, where the presence of a rounded vowel (/y/ rather than /i/) leads fricatives to be perceived as /s/ rather than /S/. This results was consistently found in all three experiments. A vowel was also more likely to be perceived as rounded /y/ if that lead listeners to be perceive words rather than nonwords (Dutch: menu, English id. vs. meni Dutch nonword). This lexical influence on the perception of the vowel had, however, no consistent influence on the perception of following fricative.