Date of Award

2011-01-01

Degree Name

Master of Science

Department

Statistics

Advisor(s)

Ming-Ying Leung

Abstract

Hidden Markov models (HMM's) are a specific case of Markov models where, contrary to Markov chains, the observer is unaware of what state the model was in when the symbol is observed. Like Markov chains, HMM's assume that the future state of a sequence is dependent only on the current state of the sequence. The parameters associated with HMM's are transition and emission probabilities, where transition probabilities are associated with the probability of transitioning from one state to another, and emission probabilities are the probabilities associated with observing a symbol given it came from a specific state.

The structure of DNA sequences is made up of the nucleotides adenine (A), cytosine (C), guanine (G), and thymine (T). CpG islands are regions within the DNA sequence where there is a higher occurrence of the CG dinucleotide.

The HMM algorithms used to analyze the DNA sequences were the Viterbi, Baum-Welch, and Viterbi training algorithms. The Viterbi algorithm determines the state-sequence that is most likely to have produced the given sequence, given the model. The Baum-Welch and Viterbi training algorithms estimate the parameters associated with an HMM.

In specific, we have assessed the accuracy of the aforementioned Viterbi algorithm at predicting the location of CpG islands within DNA sequences as well as determine the strength of the parameter estimating algorithms at recovering the model parameters.

Language

en

Provenance

Received from ProQuest

File Size

166 pages

File Format

application/pdf

Rights Holder

Roberto Angel Ortega

Share

COinS