Publication Date



Technical Report: UTEP-CS-23-27


To support machine learning of cross-language prosodic mappings and other ways to improve speech-to-speech translation, we present a protocol for collecting closely matched pairs of utterances across languages, a description of the resulting data collection and its public release, and some observations and musings. This report is intended for:

  • people using this corpus
  • people extending this corpus
  • people designing similar collections of bilingual dialog data.

Change Notes. This version supersedes UTEP-CS-22-108. There is some new information and numerous clarifications, mostly arising from our experiences diversifying our corpus and helping a vendor to use this protocol.