Open Access Theses & Dissertations

From Text to Utility: Distance-Aware Contrastive Learning for Detection-Ready and Shareable Malware Descriptions

Ivan Alejandro Montoya Sanchez, University of Texas at El Paso

Date of Award

2025-05-01

Degree Name

Master of Science

Department

Computer Science

Advisor(s)

Aritran Piplai

Abstract

The rapid rise of sophisticated malware variants poses significant challenges for cybersecurity analysts, particularly due to the scarcity of data on newly emerging threats. Due to privacy, legal, and operational constraints, malware samples are often not shareable; instead, organizations publish cyber threat intelligence (CTI) in natural language. However, these reports are typically unstructured and inconsistent, limiting their utility in machine learning (ML) models. This thesis explores whether high-fidelity, shareable threat intelligence can be automatically generated from structured malware behaviors to supplement ML models when direct access to malware samples is limited. Two central questions are addressed:(i) How can descriptions be made both representative and shareable, avoiding personally identifiable information (PII) or sensitive traits? (ii) Can these descriptions support downstream tasks such as few-shot malware classification in low-data conditions? To address these, a distance-aware contrastive loss improves alignment between behavioral data and text, while a privacy-aware penalty reduces sensitive content. The generated descriptions are used in a Model-Agnostic Meta-Learning (MAML) pipeline, with distilled knowledge improving downstream performance. Evaluations on CIC-AndMal-2020 and BODMAS show up to 42% improvement over pre-trained LLMs in few-shot classification, and a 10-20% gain through multimodal fusion. Gains are also reflected in semantic metrics such as RAGAS Answer Correctness and Similarity. By enabling the automated generation of task-relevant CTI, this work facilitates secure sharing of anonymized behavioral profiles, thereby advancing collaborative threat detection, improving integration into real-world security systems, and empowering organizations to crowdsource effective defense strategies against emerging threats.

Language

Provenance

Received from ProQuest

Copyright Date

2025-05

File Size

63 p.

File Format

application/pdf

Rights Holder

Ivan Alejandro Montoya Sanchez

Recommended Citation

Montoya Sanchez, Ivan Alejandro, "From Text to Utility: Distance-Aware Contrastive Learning for Detection-Ready and Shareable Malware Descriptions" (2025). Open Access Theses & Dissertations. 4418.
https://scholarworks.utep.edu/open_etd/4418

Download

Included in

Computer Sciences Commons

COinS