Interspeech 2007 logo
August 27-31, 2007

Antwerp, Belgium
Antwerp Cathedral tower Antwerp main square Antwerp harbour in twilight
printer friendly version of this page

Interspeech 2007 Session TuC.P2b: Education and training


Type poster
Date Tuesday, August 28, 2007
Time 13:30 – 15:30
Room Alpaerts
Chair Mari Ostendorf (University of Washington)

TuC.P2b‑1

Syllable Lattices as a Basis for a Children’s Speech Reading Tracker
Daniel Bolanos, PhD student at CSLR
Wayne Ward, Research Professor and director of the CSLR
Sarel Van Vuuren, Research Associate at the CSLR
Javier Garrido, Associate Professor

In this paper we present an algorithm that makes use of information contained in syllable lattices to significantly reduce the classification error rate of a children’s speech reading tracker. The task is to verify whether each word in a reference string was actually spoken. A syllable graph is generated from the reference word string to represent acceptable pronunciation alternatives. A syllable based continuous speech recognizer is used to generate a syllable lattice. The best alignment between the reference graph and the syllable lattice is determined using a dynamic programming algorithm. The speech vectors that are aligned with each syllable are used as features for Support Vector Machine classifiers that accept or reject each syllable in the aligned path. Experimental results over three children’s speech corpora show that this algorithm substantially reduces the classification error rate over the standard word based tracker and over a simple best-path syllable based tracker.
TuC.P2b‑2

Mandarin Vowel Pronunciation Quality Evaluation by Using Formant Pattern Recognition
Fuping Pan, ThinkIT laboratory, Institute of Acoustics, Chinese Academy of Sciences
Qingwei Zhao, ThinkIT laboratory, Institute of Acoustics, Chinese Academy of Sciences
Yonghong Yan, ThinkIT laboratory, Institute of Acoustics, Chinese Academy of Sciences

In this paper we propose to apply formant pattern recognition to Mandarin vowel pronunciation assessment. We devise a novel pitch cycle detection method and suggest estimating formant frequencies from observations of the frequency domain by using pitch-synchronous analysis. Statistically based classifiers are trained to discriminate formant patterns for vowel pronunciation assessment. Five confusable Mandarin vowels are selected for experiments. Assessment results show an average human-machine score correlation improvement of 6.10% of the new method over ASR technique, and show an average improvement of 6.37% over traditional LPC analyzing method.
TuC.P2b‑3

Automatic Detection and Classification of Disfluent Reading Miscues in Young Children's Speech for the Purpose of Assessment
Matthew Black, Signal Analysis and Interpretation Laboratory, University of Southern California, Los Angeles, CA, USA
Joseph Tepperman, Signal Analysis and Interpretation Laboratory, University of Southern California, Los Angeles, CA, USA
Sungbok Lee, Signal Analysis and Interpretation Laboratory, University of Southern California, Los Angeles, CA, USA
Patti Price, PPrice Speech and Language Technology, Menlo Park, CA, USA
Shrikanth Narayanan, Signal Analysis and Interpretation Laboratory, University of Southern California, Los Angeles, CA, USA

This paper explores the importance of disfluent reading miscues (sounding-out, hesitations, whispering, elongated onsets, question intonations) in automating the assessment of children’s oral word reading tasks. Analysis showed that a significant portion (21%) of the speech obtained from grades K-2 children from predominantly Spanish-speaking families contained at least one disfluent reading miscue. We discovered human evaluators rated the fluency nearly as important as accuracy when judging the overall reading ability of a child. We devised a lexical method for automatically detecting the sounding-out, hesitation, and whispering disfluencies, which achieved a 14.9% missed detection and 8.9% false alarm rate. We were also able to discriminate 69.4% of the sound-outs from other disfluencies with a 28.5% false alarm rate, a promising and novel result.
TuC.P2b‑4

Structural Assessment of Language Learners' Pronunciation
Nobuaki Minematsu, The University of Tokyo
Kei Kamata, The University of Tokyo
Satoshi Asakawa, The University of Tokyo
Takehiko Makino, Chuo University
Tazuko Nishimura, The University of Tokyo
Keikichi Hirose, The University of Tokyo

Speaker-invariant structural representation of speech was proposed, where only the phonic contrasts between speech sounds were extracted to form their external structure. Considering a mapping function between speaker A's acoustic space and B's space, the speech dynamics was mathematically proven to be invariant between the two. This structural and dynamic representation was applied to describe the pronunciation. As the non-linguistic factors were removed, the representation purely focused on the non-nativeness. For vowel learning, it was automatically estimated for each learner which vowels to correct by priority. Unlike the conventional approach, the estimation was done without the direct use of sound substances such as spectrums. In this paper, using the vowel charts of the learners plotted by a phonetician, the validity of this contrastive or relative approach is examined by comparing it with the conventional absolute approach. Results show the high validity of our proposal.
TuC.P2b‑5

Enhancing usability of CAPL system for Qur’an recitation learning
Abd El-Rahman Abd El-Rahman, Katholieke Universiteit Leuven - Dept. ESAT, Belgium
Sherif Abdou, Research & Development International company (RDI®)
Ahmed Khalil, Cairo University, Egypt
Mohsen Rashwan, Research & Development International company (RDI®)

This paper describes some enhancements for a speech-enabled Computer Aided Pronunciation Learning (CAPL) system HAFSS©. This system was developed for teaching Holy Qur'an recitation rules and Arabic pronunciations to non-native speakers. One important point that is critical in any practical language learning system that exploits ASR technology is the user enrolment time. In this paper we introduce the modifications that were done on the baseline system to reduce the amount of the enrolment time while keeping the system accuracy at the same level. Also we introduce results of some experiments that measure the correlation between the judgments of HAFSS system and the judgments of human experts. Also we measured the usefulness of HAFSS system for beginner users by measuring their proficiencies before and after using the system.
TuC.P2b‑6

Automatic large-scale oral language proficiency assessment
Febe de Wet, Stellenbosch University Centre for Language and Speech Technology
Christa van der Walt, Department of Curriculum Studies, Stellenbosch University
Thomas Niesler, Department of Electrical and Electronic Engineering, Stellenbosch University

We describe first results obtained during the development of an automatic system for the assessment of spoken English proficiency of university students. The ultimate aim of this system is to allow fast, consistent and objective assessment of oral proficiency for the purpose of placing students in courses appropriate to their language skills. Rate of speech (ROS) was chosen as an indicator of fluency for a number of oral language exercises. In a test involving 106 student subjects, the assessments of 5 human raters are compared with evaluations based on automatically-derived ROS scores. It is found that, although the ROS is estimated accurately, the correlation between human assessments and the ROS scores varies between 0.5 and 0.6. However, the results also indicate that only two of the five human raters were consistent in their appraisals, and that there was only mild inter-rater agreement.

ISCA logo Universiteit Antwerpen logo Radboud University Nijmegen logo Katholieke Universiteit Leuven logo