Open Access Theses & Dissertations

Exploiting The In-Distribution Embedding Space With Deep Learning And Gaussian Discriminant Analysis For An Out-Of-Distribution Malware Attach Detection

Tosin Olusola Ige, University of Texas at El Paso

Date of Award

2025-12-01

Degree Name

Doctor of Philosophy

Department

Computer Science

Advisor(s)

Christopher Kiekintveld

Second Advisor

Aritran Piplai

Abstract

State-of-the-art machine and deep learning models generally perform well on previously seen data, albeit with wrong close world assumption that all real-world data are from previously seen train and validation samples, hence there poor performance when exposed to data which deviates from previously seen training and validation set. This is clearly evident in the domain of cybersecurity where the world continues to experience several high profile malware attacks despite advancement in state-of-the-art research. The reason being that the constant evolvement of innovation in the development of tools and method deployed to carry out various attacks had given hackers and other cybercriminals alike significant leverage which enables them to easily carry out more intelligent and robust attacks due to (i) the significant evolvement of such tools make it easier for new variants of malware to be created (ii) the rate at which new malware variants are developed significantly outpace state-of-the-art research as an average of over 1,500 brand new malware variants are created on daily bases according to SonicWall statistics (iii) the awareness of cybercriminals to vulnerabilities of current state-of-the-art machine and deep learning models to new malware variants.

To address the vulnerability of state-of-the-art machine and deep learning approaches to an out-of-distribution problem, several state-of-the-art approaches such as adversarial training, input transformation, self adaptive training, adversarial purification, zero-shot, one-shot, few-shot had been proposed and applied to an arrays of benchmark datasets in various research domain but none of those approaches had been applied to an actual out-of-distribution malware attack problem. During our initial investigative research, we implemented these approaches on four (4) benchmark malware datasets in an out-of-distribution settings which all gave a poor performance thereby leading to our assertion that the poor performance of current state-of-the-art approaches to an out of distribution malware attack classification is not unconnected to variations of each malware variants from the same malware family unlike other domain dataset. Considering that, current state-of-the-art out-of-distribution approaches does not address the inter-family variation in dynamic and static behavior among malware from the same family as evidence in the dismal performance of such models when exposed to an out-of-distribution malware.

We proposed a two-stage framework that addresses this limitation by incorporating Gaussian discriminant embeddings into deep neural networks to model spherical decision boundaries around malware families in the embedding space. The first stage employs unsupervised cluster analysis to determine whether a test sample is in-distribution or out-of-distribution, using z-score-based statistical analysis for reliable outlier detection. The second stage introduces a deep learning model trained on refined embeddings from the initial stage, using predictions from both the cluster analysis and a primary classifier to enhance final prediction accuracy. Evaluation on a dataset comprising 25 malware families and novel OOD samples demonstrates superior performance against softmax confidence and mahalahobis distance baseline, achieving an AUC of 0.911 for OOD detection. This approach significantly improves the distinguishability of OOD samples and offers a scalable and statistically grounded method for robust malware classification and anomaly detection in cybersecurity contexts. We address this problem of intra-family variation within same malware family by: 1) exploitation of the in-dimensional embedding space between variants from the same malware family to account for all variations 2) exploitation of the inter-dimensional space from different malware family 3) building a deep learning-based model with a shallow neural network containing maximum of two connected layers to overcome overfitting from the scratch 4) building a Bayesian inference based computation algorithm that intertwine with connected network and is able to create new and adjust existing data point in response to an exposure to an out-of-distribution variants of existing family of new malware family which determines and the extent at which weight should be adjustment thereby triggering the gradient. Finally, We will be evaluating our approach using various statistical measures and comparing it with various baselines.

Language

Provenance

Received from ProQuest

Copyright Date

2025-12

File Size

154 p.

File Format

application/pdf

Rights Holder

Tosin Olusola Ige

Recommended Citation

Ige, Tosin Olusola, "Exploiting The In-Distribution Embedding Space With Deep Learning And Gaussian Discriminant Analysis For An Out-Of-Distribution Malware Attach Detection" (2025). Open Access Theses & Dissertations. 4561.
https://scholarworks.utep.edu/open_etd/4561

Download

Included in

Artificial Intelligence and Robotics Commons

COinS

Open Access Theses & Dissertations

Exploiting The In-Distribution Embedding Space With Deep Learning And Gaussian Discriminant Analysis For An Out-Of-Distribution Malware Attach Detection

Date of Award

Degree Name

Department

Advisor(s)

Second Advisor

Abstract

Language

Provenance

Copyright Date

File Size

File Format

Rights Holder

Recommended Citation

Included in

Search

Links

Browse

Author Corner

Open Access Theses & Dissertations

Exploiting The In-Distribution Embedding Space With Deep Learning And Gaussian Discriminant Analysis For An Out-Of-Distribution Malware Attach Detection

Author

Date of Award

Degree Name

Department

Advisor(s)

Second Advisor

Abstract

Language

Provenance

Copyright Date

File Size

File Format

Rights Holder

Recommended Citation

Included in

Share

Search

Links

Browse

Author Corner