Towards a Model of the Mapping Between English and Spanish Prosody

Jonathan Avila, University of Texas at El Paso


Current speech-to-speech translation systems face challenges in effectively translating the nuances of prosody, which plays a pivotal role in conveying speaker intent and stance in dialog. This limitation restricts cross-lingual communication, especially in situations demanding deeper interpersonal understanding. To address this, this research delves into the relationships between prosody and its pragmatic functions, in English and Spanish. First, I discuss a data collection protocol in which bilingual speakers re-enact utterances from an earlier conversation in their other language, then describe an English-Spanish corpus, consisting of 3816 matched utterance pairs. Second, I describe a prosodic dissimilarity metric based on Euclidean distance over a broad set of prosodic features. I then used these to investigate cross-language prosodic differences, and create three simple models for mapping prosody from one language to another to identify phenomena which will require more powerful modeling. These findings should inform future research on cross-language prosody and the design of speech-to-speech translation systems capable of effective prosody translation.

