Improved efficiency of RNA secondary structure prediction using distributed computing
The rapidly growing amounts of available biomolecular sequence data, which may represent information from small gene fragments to large complete genomes, have led to the a great need for powerful computational resources for data analysis and storage. With the decoding of the human and other genomes, RNA secondary structure prediction has become an important area of interest in biology and medicine because they help in understanding the mechanisms of many biological processes such as gene regulation and viral replication, and in designing RNA-based therapies to treat various diseases. Due to the complexity of their algorithms, many existing and upcoming computational tools for the prediction of RNA secondary structures, require large amounts of memory and processing time, and therefore can only handle RNA sequences of limited length. For example, the pknotsRG program, which can predict RNA secondary structures with pseudoknots, has a limitation of handling no more than 800 nucleotide at one time. However, many RNA, such as the RNA viral genomes, contains thousands of nucleotides, making secondary structure prediction impractical if not impossible. I will present an alternative approach, in which a cutting method to generate chunks of RNA sequences is first applied, then the pknotsRG program is used for prediction, and finally a high-throughput distributed batch computing system called HTCondor is used to reduce the waiting time for the RNA secondary structure prediction.
Cardenas James, Gerardo Alberto, "Improved efficiency of RNA secondary structure prediction using distributed computing" (2013). ETD Collection for University of Texas, El Paso. AAI1539924.