Date of Award
2025-12-01
Degree Name
Master of Science
Department
Computer Science
Advisor(s)
Nigel G. Ward
Abstract
Pragmatic fidelity in speech-to-speech translation (S2ST) has largely been understudied, leading to communication tools inadequate to support non-superficial dialog. We aim to improve pragmatic faithfulness in English-Spanish translation through the development of machine learning models that are able to predict a corresponding pragmatic representation in the other language. To evaluate performance, we developed a pipeline that utilizes a recently-developed pragmatic similarity evaluation metric to compare models. Further, we developed models that exploit HuBERT features as these have been found suitable for various prosody and pragmatics related tasks. Our models outperformed human and state-of-the-art predictions, albeit the methodology being limited to pragmatic representations rather than generated audio. Through a qualitative analysis of the best and worst utterances by human re-enactments and Seamless Expressive, a SOTA model, we observed patterns relating to their strengths and weaknesses. Human re-enactors tended to do well in simple utterances relating to pauses, lengthening, and emphasized words, and struggled with utterances that conveyed hesitance. Seamless Expressive performed well in utterances with one or no salient prosodic qualities, but faced challenges with conveying a non-neutral tone of voice and handling utterances with extreme prosodic qualities. These observations can guide future S2ST research to develop pragmatically focused translation tools that improve communication between people.
Language
en
Provenance
Received from ProQuest
Copyright Date
2025-12
File Size
76 p.
File Format
application/pdf
Rights Holder
Javier Vazquez
Recommended Citation
Vazquez, Javier, "HuBERT-Based Models and Evaluation Strategies for Pragmatically-Faithful Speech to Speech Translation" (2025). Open Access Theses & Dissertations. 4603.
https://scholarworks.utep.edu/open_etd/4603