Date of Award

2025-12-01

Degree Name

Master of Science

Department

Computer Science

Advisor(s)

Nigel G. Ward

Abstract

Pragmatic fidelity in speech-to-speech translation (S2ST) has largely been understudied, leading to communication tools inadequate to support non-superficial dialog. We aim to improve pragmatic faithfulness in English-Spanish translation through the development of machine learning models that are able to predict a corresponding pragmatic representation in the other language. To evaluate performance, we developed a pipeline that utilizes a recently-developed pragmatic similarity evaluation metric to compare models. Further, we developed models that exploit HuBERT features as these have been found suitable for various prosody and pragmatics related tasks. Our models outperformed human and state-of-the-art predictions, albeit the methodology being limited to pragmatic representations rather than generated audio. Through a qualitative analysis of the best and worst utterances by human re-enactments and Seamless Expressive, a SOTA model, we observed patterns relating to their strengths and weaknesses. Human re-enactors tended to do well in simple utterances relating to pauses, lengthening, and emphasized words, and struggled with utterances that conveyed hesitance. Seamless Expressive performed well in utterances with one or no salient prosodic qualities, but faced challenges with conveying a non-neutral tone of voice and handling utterances with extreme prosodic qualities. These observations can guide future S2ST research to develop pragmatically focused translation tools that improve communication between people.

Language

en

Provenance

Received from ProQuest

File Size

76 p.

File Format

application/pdf

Rights Holder

Javier Vazquez

Share

COinS