Roundup of AI Threat Intelligence: February 2025

AI threat research at Cisco plays a vital role in shaping our approach to assessing and safeguarding models.

AI threat research at Cisco plays a vital role in shaping our approach to assessing and safeguarding models. In a field that is constantly changing, these endeavors are crucial in ensuring the security of our customers against new vulnerabilities and adversarial tactics.

Here is a summary of important highlights and key insights from external threat research shared with the wider AI security community. Please note that this is not an exhaustive list of AI threats, but rather a selection that we find particularly significant.

Key Threats and Progress: February 2025

Adversarial Reasoning during Jailbreak

An innovative Adversarial Reasoning technique for automated model jailbreaking, developed by Cisco’s AI security researchers in collaboration with the University of Pennsylvania, uses advanced model reasoning to exploit feedback signals from a large language model (LLM). This approach circumvents security measures and enables harmful objectives.

Following a recent Cisco blog post on DeepSeek R1 and OpenAI o1-preview security assessment, researchers achieved a 100% attack success rate against the DeepSeek model. This highlights significant security vulnerabilities and emphasizes the need for robust defenses that consider complete reasoning paths in AI systems.

MITRE ATLAS: AML.T0054 – LLM Jailbreak

Reference: arXiv

Voice-Powered Jailbreaks for Multimodal LLMs

A new attack methodology, the Flanking Attack, introduced by researchers from the University of Sydney and the University of Chicago, focuses on voice-based jailbreaks for multimodal LLMs. By utilizing voice modulation and context obfuscation, this approach evades model defenses and poses a significant risk, especially in scenarios involving audio inputs.

The Flanking Attack demonstrated high attack success rates across various harm scenarios, indicating a substantial threat to models supporting audio inputs. This underscores the critical need for stringent security measures in multimodal AI systems.

MITRE ATLAS: AML.T0054 – LLM Jailbreak

Reference: arXiv

Terminal DiLLMa: LLM Terminal Hijack

Security researcher Johann Rehberger explored the potential for LLM applications to hijack terminals, building on a vulnerability identified by Leon Derczynski. This vulnerability, affecting terminal services or CLI tools, arises from the misuse of ANSI escape codes in LLM outputs like GPT-4, enabling malicious actions like terminal state alteration and data exfiltration.

Protections must be implemented to prevent manipulation by adversaries, particularly in cases where LLM outputs are directly displayed on terminal interfaces.

MITRE ATLAS: AML.T0050 – Command and Scripting Interpreter

Reference: Embrace the Red; Inter Human Agreement (Substack)

ToolCommander: Manipulating LLM Tool Systems

Researchers from Chinese universities developed ToolCommander, an attack framework that inserts malicious tools into LLM applications to carry out privacy breaches, denial of service, and unscheduled tool calls. This framework, operating in two stages, exposes vulnerabilities in LLM systems, emphasizing the importance of stringent security measures for tool-calling capabilities.

Evaluations revealed vulnerabilities in various LLM systems, underscoring the need for enhanced security measures as LLM applications become more widespread.

MITRE ATLAS: AML.T0029 – Denial of ML Service; AML.T0053 – LLM Plugin Compromise

Reference: arXiv


Share your thoughts, ask questions, and stay connected with Cisco Secure on social media!

Cisco Security Social Media:

Instagram
Facebook
Twitter
LinkedIn

About Author

Subscribe To InfoSec Today News

You have successfully subscribed to the newsletter

There was an error while trying to send your request. Please try again.

World Wide Crypto will use the information you provide on this form to be in touch with you and to provide updates and marketing.