On the Predictability of Appropriate Prosody of Dialog Markers Directly From the Local Context

Anindita Nath, University of Texas at El Paso


Today’s state-of-the-art spoken dialog systems lack context-appropriate prosody in their responses, often making them sound unnatural. Better modeling of this contextual dependency would enable natural prosodic responsiveness. Accordingly, this dissertation explores the extent to which the prosody of a dialog marker can be predicted directly from the prosody of its local context. The prediction performance was evaluated in terms of the similarity between the predicted and the observed prosodic features as measured by the reduction of root mean square error from the baseline. This prediction task was accomplished for multiple combinations of various sets of context features and different machine learning algorithms. Simple machine-learning models, without any knowledge of pragmatic intent or phonetic structure, could predict prosody, to a certain extent, for each of the most common twelve types of dialog markers in a corpus of unstructured American English dialogs. A simple feed-forward multi-layered artificial neural networks model performed best, with an overall average reduction in prediction error of 42%. This proposed prosody prediction approach has value also for a task-oriented dialog domain.

Subject Area

Computer science|Information Technology|Artificial intelligence

Recommended Citation

Nath, Anindita, "On the Predictability of Appropriate Prosody of Dialog Markers Directly From the Local Context" (2023). ETD Collection for University of Texas, El Paso. AAI30490101.