AI threat research at Cisco plays a vital role in shaping our approach to assessing and safeguarding models. In a field that is constantly changing, these endeavors are crucial in ensuring the security of our customers against new vulnerabilities and adversarial tactics.
Here is a summary of important highlights and key insights from external threat research shared with the wider AI security community. Please note that this is not an exhaustive list of AI threats, but rather a selection that we find particularly significant.
Key Threats and Progress: February 2025
Adversarial Reasoning during Jailbreak
An innovative Adversarial Reasoning technique for automated model jailbreaking, developed by Cisco’s AI security researchers in collaboration with the University of Pennsylvania, uses advanced model reasoning to exploit feedback signals from a large language model (LLM). This approach circumvents security measures and enables harmful objectives.
Following a recent Cisco blog post on DeepSeek R1 and OpenAI o1-preview security assessment, researchers achieved a 100% attack success rate against the DeepSeek model. This highlights significant security vulnerabilities and emphasizes the need for robust defenses that consider complete reasoning paths in AI systems.
MITRE ATLAS: AML.T0054 – LLM Jailbreak
Reference: arXiv
Voice-Powered Jailbreaks for Multimodal LLMs
A new attack methodology, the Flanking Attack, introduced by researchers from the University of Sydney and the University of Chicago, focuses on voice-based jailbreaks for multimodal LLMs. By utilizing voice modulation and context obfuscation, this approach evades model defenses and poses a significant risk, especially in scenarios involving audio inputs.
The Flanking Attack demonstrated high attack success rates across various harm scenarios, indicating a substantial threat to models supporting audio inputs. This underscores the critical need for stringent security measures in multimodal AI systems.
MITRE ATLAS: AML.T0054 – LLM Jailbreak
Reference: arXiv
Terminal DiLLMa: LLM Terminal Hijack
Security researcher Johann Rehberger explored the potential for LLM applications to hijack terminals, building on a vulnerability identified by Leon Derczynski. This vulnerability, affecting terminal services or CLI tools, arises from the misuse of ANSI escape codes in LLM outputs like GPT-4, enabling malicious actions like terminal state alteration and data exfiltration.
Protections must be implemented to prevent manipulation by adversaries, particularly in cases where LLM outputs are directly displayed on terminal interfaces.
MITRE ATLAS: AML.T0050 – Command and Scripting Interpreter
Reference: Embrace the Red; Inter Human Agreement (Substack)
ToolCommander: Manipulating LLM Tool Systems
Researchers from Chinese universities developed ToolCommander, an attack framework that inserts malicious tools into LLM applications to carry out privacy breaches, denial of service, and unscheduled tool calls. This framework, operating in two stages, exposes vulnerabilities in LLM systems, emphasizing the importance of stringent security measures for tool-calling capabilities.
Evaluations revealed vulnerabilities in various LLM systems, underscoring the need for enhanced security measures as LLM applications become more widespread.
MITRE ATLAS: AML.T0029 – Denial of ML Service; AML.T0053 – LLM Plugin Compromise
Reference: arXiv
Share your thoughts, ask questions, and stay connected with Cisco Secure on social media!
Cisco Security Social Media:
Instagram
Facebook
Twitter
LinkedIn
