Prediction of ribonucleic acid secondary structures using a heuristic backtracking search

Christopher Roman Cuellar, University of Texas at El Paso


Ribonucleic acid (RNA) is essential for all forms of life. RNA is made up of a large chain of nucleotide bases: Guanine (G), Uracil (U), Cytosine (C), and Adenine (A). An RNA strand can fold on itself to allow G-C, A-U, and G-U bases to form hydrogen bonds, this is known as a secondary structure. Knowing the secondary structure of an RNA chain is very important because it will allow researchers to better understand its specific functions. RNA will create secondary structures that tend to minimize their free energy. RNA secondary structure prediction is the attempt to predict physical folding of RNA given its linear strand. A common approach to RNA secondary structure prediction is dynamic programming. Dynamic programming is based on the assumption that a given problem can be solved optimally by recursively solving its subproblems optimally. Dynamic programming approaches for secondary structure prediction have running times of O(n3), where n is the length of the RNA sequence. There are two main problems with the dynamic programming approach to RNA secondary structure. First, for very long chains, computing a prediction can take a substantial amount of time. Second, some foldings contain secondary structures that violate the assumption of optimal substructure. In this thesis, I propose an approach to RNA secondary structure prediction that attempts to overcome the limitations of dynamic programming. The approach is based on depth-first search in combination with a set of heuristics. I use a preprocessing stage, first proposed by Weise for his genetic algorithms, to find palindromic sequences, which are helical regions of RNA pairings. Then I search for a subset of structures that are mutually compatible and minimize the free energy using depth-first search. This search is further sped by applying a set of heuristics that take into consideration palindrome length and likely compatibility with other potential structures. A couple of advantages of this depth first search approach are that it does not rely on optimal substructures and is easily parallelizable. Experiments show that the proposed methodology is promising because of these advantages and the results that were produced being competitive with those of MFOLD, a well-established secondary structure prediction algorithm.

Subject Area

Bioinformatics|Computer science

Recommended Citation

Cuellar, Christopher Roman, "Prediction of ribonucleic acid secondary structures using a heuristic backtracking search" (2011). ETD Collection for University of Texas, El Paso. AAI1503709.