Date of Award


Degree Name

Master of Science


Computer Science


Nigel G. Ward


Previous studies show that immediate and long range prosodic context provide benecial information when applied to a language model. However, the fact that some features provide more information to the prediction task should be considered. If the information contribution of each feature can be determined, then a well-crafted feature set can be built to improve the performance of a language model. In this study, I measure the contribution of dierent prosodic features to a baseline trigram model. Using this information, it should be possible to build a language model that uses the most informative resources and ultimately performs better than a language model that includes prosodic information naively. Using this information, I build a prosodic feature set of 103 prosodic features from past and future context computed for both speaker and interlocutor. Principal component analysis is applied to this feature set to build a model that achieves a 25.9% perplexity reduction relative to a tri-gram model. However, this model falls short of performance improvements achieved by a similar model without proper feature selection by -1.2%.




Received from ProQuest

File Size

89 pages

File Format


Rights Holder

Alejandro Vega