Date of Award

2023-05-01

Degree Name

Doctor of Philosophy

Department

Computer Science

Advisor(s)

Nigel Ward

Abstract

Today's state-of-the-art spoken dialog systems lack context-appropriate prosody in their responses, often making them sound unnatural. Better modeling of this contextual dependency would enable natural prosodic responsiveness. Accordingly, this dissertation explores the extent to which the prosody of a dialog marker can be predicted directly from the prosody of its local context. The prediction performance was evaluated in terms of the similarity between the predicted and the observed prosodic features as measured by the reduction of root mean square error from the baseline. This prediction task was accomplished for multiple combinations of various sets of context features and different machine learning algorithms. Simple machine-learning models, without any knowledge of pragmatic intent or phonetic structure, could predict prosody, to a certain extent, for each of the most common twelve types of dialog markers in a corpus of unstructured American English dialogs. A simple feed-forward multi-layered artificial neural networks model performed best, with an overall average reduction in prediction error of 42%. This proposed prosody prediction approach has value also for a task-oriented dialog domain.

Language

en

Provenance

Recieved from ProQuest

File Size

p.

File Format

application/pdf

Rights Holder

Anindita Nath

Share

COinS