Date of Award
2025-05-01
Degree Name
Doctor of Philosophy
Department
Mathematical Sciences
Advisor(s)
Jonathon Mohl
Abstract
Multi-domain machine learning applications have revolutionized how we understand and predict complex biological phenomena. This dissertation presents novel computational methodologies addressing two critical problems: mallard classification using single-nucleotide polymorphisms (SNPs), and protein function prediction via interpretable topic-aware peptide embeddings. The research focuses on distinguishing mallard populations through SNP data, which are inherently characterized by ultra-high dimensionality. The research uses advanced feature-selection and dimensionality-reduction strategies alongside machine learning classification algorithms to identify minimal, yet highly predictive SNP sets crucial for accurate breed differentiation. This framework demonstrates robust performance with optimal computational efficiency, significantly aiding conservation and breed management efforts. Furthermore, the research project also leverages natural language processing techniques applied to biological sequences, specifically employing enzyme-based sequence fragmentation (e.g., trypsin digestion) followed by embedding with Word2Vec models. Topic modeling (BERTopic) of these peptide embeddings facilitates functional classification (Gene Ontology term prediction), achieving ROC-AUC scores comparable to full-sequence models (98.9% vs. 99%). Notably, topic-derived peptides frequently align with known functional motifs, including ligand-binding sites, underscoring their biological significance and interpretability. Collectively, these studies illustrate the power of machine learning for handling diverse biological datasets, providing accurate predictive models and interpretable insights critical for practical biological discovery and decision-making.
Language
en
Provenance
Received from ProQuest
Copyright Date
2025-05
File Size
87 p.
File Format
application/pdf
Rights Holder
Tolulope Samuel Adeyina
Recommended Citation
Adeyina, Tolulope Samuel, "Multi-Domain Machine Learning For Biological Classification: Mallard Classification And Protein Function Prediction" (2025). Open Access Theses & Dissertations. 4320.
https://scholarworks.utep.edu/open_etd/4320