Detecting Complex Cyber Attacks Using Decoys with Online Reinforcement Learning

Marcus Gutierrez, University of Texas at El Paso


Most vulnerabilities discovered in cybersecurity can be associated with their own singular piece of software. I investigate complex vulnerabilities, which may require multiple software to be present. These complex vulnerabilities represent 16.6% of all documented vulnerabilities and are more dangerous on average than their simple vulnerability counterparts. In addition to this, because they often require multiple pieces of software to be present, they are harder to identify overall as specific combinations are needed for the vulnerability to appear. I consider the motivating scenario where an attacker is repeatedly deploying exploits that use complex vulnerabilities into an Airport Wi-Fi. The network administrator attempts to detect these exploits through the use of cyber decoys. The network administrator chooses a fake device, which has no valuable data, to insert into the network in hopes that the attacker believes the decoy is a legitimate device for exploitation. However, for each interaction, the network administrator needs to make an active selection of which decoy device to insert into the network. I investigate this motivating scenario as a learning problem where the network administrator wishes to learn which decoy settings are best to capture the most exploits at a given time to learn about the most complex vulnerabilities. Because I model that the network administrator is discovering new complex vulnerabilities over time, there is a lot of uncertainty in this problem. I resort to the Multi-Armed Bandit (MAB) reinforcement learning framework to address some of this uncertainty. The MAB problem epitomizes the exploration-exploitation dilemma. The network administrator wishes to capture the most exploits, so it makes sense to frequently use the decoy settings that have worked well in the past (exploitation), but there may be better-performing decoy settings that are insufficiently explored. Furthermore, the MAB framework emphasizes the decisions and actively ignores some complexities with the problem structure, as is the case with the adversarial MAB.

Subject Area

Computer science|Computer Engineering

Recommended Citation

Gutierrez, Marcus, "Detecting Complex Cyber Attacks Using Decoys with Online Reinforcement Learning" (2023). ETD Collection for University of Texas, El Paso. AAI30527843.