Interspeech 2007 Session TuC.P1a: Discourse, dialogue and conversation
Tuesday, August 28, 2007
13:30 – 15:30
Yoshinori Sagisaka (Waseda univ.)
Voice Source and Vocal Tract Variations as Cues to Emotional States Perceived from Expressive Conversational Speech
Hiroki Mori, Faculty of Engineering, Utsunomiya University, Japan
Hideki Kasuya, International University of Health and Welfare, Japan
Speech parameters originating from voice source and vocal tract were analyzed to find acoustic correlates of dimensional descriptions of emotional states. To achieve this goal best, we adopted the Utsunomiya University Spoken Dialogue Database, which was designed for studies on paralinguistic information in expressive conversational speech. Analyses for four female and two male speakers showed: (i) Prosodic parameters were highly correlated especially with the activation dimension, (ii) The aperiodicity-related voice source parameter showed that breathy phonation was mainly used in unpleasant utterances for three females, (iii) Due to smiling facial expression, formant frequencies were higher in pleasant utterances for a female.
Exploring Initiative Strategies using Computer Simulation
Fan Yang, OGI-OHSU
Peter A. Heeman, OGI-OHSU
We envision that next-generation spoken dialogue systems will be mixed-initiative. However, it is unclear how exactly a mixed-initiative strategy should be designed; under what circumstances should the system take the initiative, and under what circumstances should it let the user do so. The initiative strategies used in human-human conversation are a good starting point, because they are natural for the user to follow. Studying human-human conversation, however, only gives a descriptive account of human strategies. In this paper, we explore the use of computer simulation to better understand human conventions and give an explanatory account. We have two software agents solve a collaborative task using different initiative strategies, the first derived from analysis of human-human dialogues, and two alternatives based on proposals in the literature. Our simulation results show that the former is more efficient than the others. This helps support the explanation that people use an initiative strategy that minimizes collaborative effort.
From One Base Form to Multiple Output Styles-Predicting Stylistic Dynamics of Discourse Prosody
Chiu-yu Tseng, Institute of Linguistics, Academia Sinica, Taipei, Taiwan
Zhao-yu Su, Institute of Linguistics, Academia Sinica, Taipei, Taiwan
We hypothesize that various prosody output styles can be predicted and simulated from one default base form by accounting for contributions from higher level information to cross-phrase prosodic relationship. Speech materials of four prosody styles were selected: (1.) Han and Tang poetry, (2.) Tang Ballads and Song poetry, (3.) Qin, Tang and Song classic prose and (4.) contemporary TV weather forecast. F0 contours were analyzed using the Fujisaki model, while quantitative analyses of predictions from layered-and-cumulative contribution specified by the HPG (Hierarchical Prosodic phrase Grouping) framework [Tseng et al, 2004; 2005; 2006] were performed across styles and speakers. Results confirmed that higher level contribution is significant across style; contribution distribution patterns and style specific; more regular prosodic formats require more contribution from higher level; stylistic dynamics are predictable; and the HPG base form is indeed default.
Topic in dialogue: prosodic and syntactic features
Claudia Crocco, Vakgroep Romaanse Talen (andere dan het Frans) - Universiteit Gent (Belgium)
Renata Savy, Dipartimento di Studi Linguistici e Letterari - Università degli Studi di Salerno (Italy)
We investigate the relationship between phonetic phrasing, tonal pattern and phrase structure in left peripherical sentence topic. Our corpus consists of three task-oriented Italian dialogues. The results of prosodic analysis show that topics are usually associated to the highest pitch values in the Tone Unit, regardless to their actual syntactic position. Syntactic analysis shows that, while topic phrase structure is rather variable, topic function is quite stable, i.e., topics have mostly circumstantial-locative function, and less frequently subject function. Finally, phonetic phrasing, prominence placement and phrase structure shows clearly regular relationships.
Features of Pauses and Conjunctions at Syntactic and Discourse Boundaries in Japanese Monologues
Michiko Watanabe, Graduate School of Frontier Sciences, The University of Tokyo
Yasuharu Den, Faculty of Letters, Chiba University
Keikichi Hirose, Graduate School of Information Science and Technology, The University of Tokyo
Shusaku Miwa, Graduate School of Information Science and Technology, The University of Tokyo
Nobuaki Minematsu, Graduate School of Frontier Sciences, The University of Tokyo
Syntactic and discourse boundaries are signalled by prosodic cues as well as linguistic cues in speech. We investigated whether there is a correspondence between prosodic or linguistic cues and the boundary strengths. We measured the rates of filled pauses (FPs) and conjunctions, and the durations of silent and filled pauses and conjunctions at four types of boundaries in casual presentations in Japanese. The results showed that the rates of FPs and conjunctions and the durations of silent pauses correspond to the boundary strengths. However, no significant correspondence was found between the duration of FPs or conjunctions and the boundary strengths. The results suggest that how long the speaker pauses and whether he or she utters a FP or a conjunction is relevant to the boundary strengths. However, the durations of FPs and conjunctions are likely to be affected by the other factors such as planning difficulties of the following parts of speech.