Date of Award

2012-01-01

Degree Name

Master of Science

Department

Computer Science

Advisor(s)

Nigel G. Ward

Abstract

Previous studies show that immediate and long range prosodic context provide benecial information when applied to a language model. However, the fact that some features provide more information to the prediction task should be considered. If the information contribution of each feature can be determined, then a well-crafted feature set can be built to improve the performance of a language model. In this study, I measure the contribution of dierent prosodic features to a baseline trigram model. Using this information, it should be possible to build a language model that uses the most informative resources and ultimately performs better than a language model that includes prosodic information naively. Using this information, I build a prosodic feature set of 103 prosodic features from past and future context computed for both speaker and interlocutor. Principal component analysis is applied to this feature set to build a model that achieves a 25.9% perplexity reduction relative to a tri-gram model. However, this model falls short of performance improvements achieved by a similar model without proper feature selection by -1.2%.

Language

en

Provenance

Received from ProQuest

File Size

89 pages

File Format

application/pdf

Rights Holder

Alejandro Vega

Share

COinS