Date of Award
2023-05-01
Degree Name
Doctor of Philosophy
Department
Computer Science
Advisor(s)
Christopher Kiekintveld
Abstract
Most vulnerabilities discovered in cybersecurity can be associated with their own singular piece of software. I investigate complex vulnerabilities, which may require multiple software to be present. These complex vulnerabilities represent 16.6% of all documented vulnerabilities and are more dangerous on average than their simple vulnerability counterparts. In addition to this, because they often require multiple pieces of software to be present, they are harder to identify overall as specific combinations are needed for the vulnerability to appear.
I consider the motivating scenario where an attacker is repeatedly deploying exploits that use complex vulnerabilities into an Airport Wi-Fi. The network administrator attempts to detect these exploits through the use of cyber decoys. The network administrator chooses a fake device, which has no valuable data, to insert into the network in hopes that the attacker believes the decoy is a legitimate device for exploitation. However, for each interaction, the network administrator needs to make an active selection of which decoy device to insert into the network. I investigate this motivating scenario as a learning problem where the network administrator wishes to learn which decoy settings are best to capture the most exploits at a given time to learn about the most complex vulnerabilities.
Because I model that the network administrator is discovering new complex vulnerabilities over time, there is a lot of uncertainty in this problem. I resort to the Multi-Armed Bandit (MAB) reinforcement learning framework to address some of this uncertainty. The MAB problem epitomizes the exploration-exploitation dilemma. The network administrator wishes to capture the most exploits, so it makes sense to frequently use the decoy settings that have worked well in the past (exploitation), but there may be better-performing decoy settings that are insufficiently explored. Furthermore, the MAB framework emphasizes the decisions and actively ignores some complexities with the problem structure, as is the case with the adversarial MAB.
I first investigate how basic MAB solutions adapt to this adversarial cyber deception environment with complex vulnerabilities in a highly dynamic environment. Vulnerabilities overall are not static; new ways to exploit them get discovered, patches are developed, and new mitigation techniques are discovered. I demonstrate that basic MAB defender solutions are capable of adapting to a slow-learning attacker with an evolving arsenal of exploits. In this evolving exploits model, exploits decrease in detection value over time and the attacker gains new hidden exploits to use over the course of the interaction. Furthermore, due to the problem structure that arises from the inclusion of complex vulnerabilities, basic MAB solutions do not fully utilize the contextual information of the environment. Despite all this, I show that the basic MAB solutions do adapt and begin to approach optimal decision-making. With minor extensions to these algorithms, I also show that improvements can be made to better exploit the complicated problem structure formed by the complex vulnerabilities.
The evolving exploits work analyzed basic MAB strategies with a slow-learning attacker, however, this is not always an appropriate modeling of an attacker. In many cases, the attackers may be intelligent human hackers. In this case, it is important to understand how humans learn and make decisions in cyber adversarial environments. I look to a simplified adversarial interaction to better understand how humans will learn amidst various defenses. I detail a human experiment we ran with over 300 human participants that played a 50 round game versus 3 different simple defender algorithms. We found that human attackers quickly and efficiently discover static defenses, including fixed random defenses. The basic adaptive defender proved much harder for the human participants to attack.
If we wish to develop new defensive solutions and understand how they might perform against human attackers, it may be infeasible to repeatedly run an extensive experiment with human participants. Instead, it would be ideal to have a realistic model that acts in these cyber deception environments as a human would. We developed a cognitive model, from the Instanced-Based Learning (IBL) framework, that exhibits the same cognitive biases found in humans. With parameter tuning, we demonstrated that the IBL agent performed closely to the human participants in the experiment compared to other predictive models from reinforcement learning.
Once I demonstrated that basic MAB solutions are capable of adapting in these cyber adversarial environments with complex vulnerabilities and we have a cognitive model that performs as humans do, I finally address the problem structure that forms from complex vulnerabilities. I formalize the decoy feature selection problem with complex vulnerabilities and introduce a new MAB variant that directly addresses this problem structure formed by complex vulnerabilities, titled the Bipartite Edge Detection problem. I show that this Bipartite Edge Detection problem appears in multiple applications and I provide an initial solution with contextual MABs. I introduce a novel approach to modeling experts that directly addresses the Bipartite Edge Detection problem via the well-known contextual MAB algorithm: Exponentially-weighted algorithm for Exploration and Exploitation with Expert advice (EXP4). My version of EXP4 outperforms all the previously seen basic MAB solutions and an IBL defender and approaches optimal play relatively quickly. The IBL attacker proves a challenging adversary and no particular defensive algorithm did notably well against it. I leave room for future work to improve my version of EXP4, introduce new bandit algorithms, investigate IBL further for these problems, and implement this work in a realistic cyber environment.
Language
en
Provenance
Recieved from ProQuest
Copyright Date
2023-05
File Size
p.
File Format
application/pdf
Rights Holder
Marcus Gutierrez
Recommended Citation
Gutierrez, Marcus, "Detecting Complex Cyber Attacks Using Decoys with Online Reinforcement Learning" (2023). Open Access Theses & Dissertations. 3878.
https://scholarworks.utep.edu/open_etd/3878