Modeling and Predicting Emerging Threats Using Disparate Data

Ismael Villanueva-Miranda, University of Texas at El Paso


Early detection is crucial to mitigate the impact of emerging threats. This work proposes four innovative frameworks that build machine learning and deterministic epidemiological models using multiple domain-specific datasets to detect the onset of emerging threats in two domains: infectious diseases and cybersecurity. Our models are designed to detect infectious disease outbreaks, model their spread, detect malware activity, and analyze the relationship between software/hardware weaknesses and attack techniques. First, we present a novel framework to detect multiple infectious disease outbreaks by integrating standardized disease-specific domain knowledge and public search trend data. Our framework showed high performance in identifying infectious disease outbreaks — diseases that are among the leading causes of illness and death in the United States— using people’s search data. In addition to detecting outbreaks, studying their spread within a region is equally important. Therefore, we present the SEIRD+m model, which integrates human mobility data into the classical deterministic SEIR epidemiological model to provide a more accurate approach to modeling epidemics. We demonstrated its efficacy using COVID-19 as a case study, showing that restricting mobility only in COVID-19 hotspots can effectively reduce predicted infections and deaths among at-risk populations, including those based on race, income, and age. Both infectious diseases and computer malware require timely and accurate detection to minimize their impact. Therefore, we extended our disease outbreak detection framework to detect malware activity over a geographic region. We use natural language processing (NLP) approaches to connect disparate cybersecurity datasets, enabling the development of a machine learning model for detecting malware activities based on people’s search trends in a specific location. Our model has proven effective in identifying malware activity in four real-world attack case studies. Aside from detecting malware activities, it is necessary to investigate the properties of software vulnerabilities and how these properties are used to compromise systems, in order to prevent cyberattacks and mitigate their impacts. Thus, we propose a framework that leverages NLP techniques to find connections between attack techniques and software vulnerabilities. The effectiveness of our framework is demonstrated through three case studies, highlighting its potential in identifying potential security/software vulnerability exploitation of multiple software weaknesses. The approaches presented in this work provide evidence that the integration of domain-specific datasets and user-generated dynamic data can enable the development of highly effective computational models for detecting emerging threats. By leveraging these models, decision-makers can rapidly identify and respond to potential threats, leading to a more efficient allocation of resources. Our work opens up exciting opportunities for further research in this area.

Subject Area

Computer science|Computer Engineering

Recommended Citation

Villanueva-Miranda, Ismael, "Modeling and Predicting Emerging Threats Using Disparate Data" (2023). ETD Collection for University of Texas, El Paso. AAI30521685.