MITRE ATLAS
When it comes to cyber-threat intelligence, MITRE’s tactics, techniques, and methods (TTMs) serve as a valuable resource—offering a standardized approach to analyzing various stages in the kill chain. By broadening the ATT&CK framework to encompass AI systems, ATLAS paves the way for identifying specific campaigns. Even though ATLAS does not directly tackle Rogue AI, the tactics like Prompt Injection, Jailbreak, and Model Poisoning, all under ATLAS TTPs, have the potential to undermine AI systems and give rise to Rogue AI entities.
In reality, these compromised Rogue AI systems function as TTMs themselves: autonomous systems can execute any ATT&CK methods and approaches (for example, Reconnaissance, Resource Development, Initial Access, ML Model Access, Execution) with various impacts. Presently, only sophisticated actors possess the capability to compromise AI systems for their objectives. However, the mere fact that they are exploring such systems for vulnerabilities should raise alarms.
While MITRE ATLAS and ATT&CK focus on Compromised Rogue AI, they have not yet delved into the realm of Malevolent Rogue AI. Although there have been no reported instances of adversaries implanting malignant AI systems within target environments, it seems inevitable. As organizations increasingly adopt autonomous AI, threat actors are bound to follow suit. Employing AI for offensive purposes is a distinct strategy in itself. Leveraging it remotely is akin to AI-based malware, and beyond that, utilizing proxies with AI services as assailants resembles an AI botnet with additional nuances.
MIT AI Risk Repository
Lastly, there is MIT’s risk vault, housing an online compendium of countless AI vulnerabilities, alongside a topic chart outlining the most recent literature on the subject. Serving as an expandable repository of community insights on AI risk, this repository holds substantial value. The assorted risks facilitate a more thorough examination, particularly introducing the concept of causation, addressing three fundamental dimensions:
- Who served as the cause (human/AI/unknown)
- How it was triggered in AI system deployment (inadvertently or intentionally)
- When it occurred (prior to, subsequent to, unknown)
Understanding intent proves beneficial in comprehending Rogue AI, albeit this aspect is only addressed elsewhere in the OWASP Security and Governance Checklist. Incidental risk often stems from a flaw rather than a MITRE ATLAS attack approach or an OWASP susceptibility.
The identification of the perpetrator of the risk can also aid in analyzing Rogue AI threats. Both humans and AI systems can unintentionally generate Rogue AI, while Malevolent Rogues are explicitly designed to wreak havoc. It is theoretically possible for Malevolent Rogues to attempt subverting existing AI systems to turn rogue, or they could be programmed to replicate themselves—though presently, humans are considered the primary intentional instigators of Rogue AI.
Being aware of the timing of the risk occurrence should be standard protocol for any risk analyst, necessitating constant vigilance throughout the AI systems’ lifecycle. This entails pre- and post-deployment assessments of systems and alignment verifications to unmask malevolent, compromised, or inadvertent Rogue AIs.
MIT categorizes risks into seven primary categories and 23 subcategories, with Rogue AI directly spotlighted in the domain of “AI System Safety, Failures, and Limitations.” It defines it as follows:
“AI systems that exhibit behaviors contrary to ethical standards or human objectives, specifically the designers’ or users’ objectives. These misaligned behaviors could stem from human actions during design and development, such as through reward manipulation and goal misinterpretation, potentially leading AI to leverage dangerous capabilities like manipulation, deceit, or situational awareness for power-seeking, self-propagation, or other purposes.”
Defense Mechanisms Emphasizing Causality and Risk Context
The crux of the matter is that incorporating AI systems expands the corporate vulnerability scope—potentially substantially. Risk frameworks need revising to factor in the threat from Rogue AI. Intent plays a pivotal role: accidental Rogue AI poses numerous risks even without direct malice. Moreover, in cases of intentional harm, understanding who is attacking whom with what resources serves as crucial context. Are threat actors or Malevolent Rogue AI entities targeting your AI systems to breed compromised Rogue AIs? Are they focused on your organization at large? And are they leveraging your resources, their own, or a third party whose AI has been subverted.
These constitute enterprise-level risks, both pre- and post-deployment. While strides have been made in the security community to better profile these threats, a comprehensive approach is lacking in addressing Rogue AI, one that encompasses causality and attack context. Bridging this gap will enable a holistic strategy to plan for and mitigate Rogue AI risks comprehensively.
