Identifying transcription factor binding sites using mixed Markov model

Bereket Weldeslassie, University of Texas at El Paso

Abstract

Identifying transcription factor binding sites (TFBS) using experimental techniques is time consuming, labor intensive and expensive. Thus the purpose of this thesis is to use Markov models to identify TFBS. After introducing the basic theory of Markov chains and Variable length Markov chains, the leading Markov models used to identify potential TFBSs will be briefly presented. The models are: (1) Position Optimized Markov Model (POMM) which uses a chi-square test to bring any non-adjacent dependent positions of the binding sequences adjacent or within close proximity and then trains a third order Markov chain to capture the dependencies. (2) Permuted Variable Length Markov Model (PVLMM) which, after ordering the positions like POMM, it fits a variable length Markov chain to the permuted positions. (3) Optimized Mixed Markov models (OMiMa) which fits a mixture of fixed order Markov models to the position optimized TFBS. In this study the fixed order Markov chain used in OMiMa to model the dependencies is replaced by a variable length Markov chain. An Optimized Mixture of Zero-Order and Variable Length Markov Models is created for binding sites bound by a specific transcription factor known as SOX9. The results show that the model is successful with a success rate of 87.67%.

Subject Area

Statistics

Recommended Citation

Weldeslassie, Bereket, "Identifying transcription factor binding sites using mixed Markov model" (2007). ETD Collection for University of Texas, El Paso. AAI1444099.
https://scholarworks.utep.edu/dissertations/AAI1444099

Share

COinS