20% of Artificial Intelligence Jailbreak Incursions Succeed, With 90% Revealing Confidential Information

Jailbreak incursions on artificial intelligence, where models are directed to bypass their protections, achieve a success rate of 20%, as per research findings. Typically, adversaries require just 42 seconds and five interactions on average to infiltrate.

In certain instances, breaches occur in as little as four seconds. These results underscore the significant weaknesses in current GenAI algorithms and the challenges in preempting exploitations in real time.

Among the successful breaches, 90% result in leaks of confidential data, based on the insights from the “State of Attacks on GenAI” report by AI security firm Pillar Security. The researchers studied assaults on over 2,000 operational AI applications over the past three months.

The primary targets of AI applications — encompassing a quarter of all attacks — are those utilized by customer support units, owing to their “widespread use and essential function in customer interactions.” Nonetheless, AIs utilized in other pivotal infrastructure sectors, such as energy and engineering software, also encountered heightened attack frequencies.

Disrupting crucial infrastructure can prompt widespread turmoil, making it an attractive objective for cyber attacks. A recent report from Malwarebytes disclosed that the services industry bears the brunt of ransomware, constituting nearly one-fourth of global attacks.

SEE: 80% of Essential National Infrastructure Enterprises Faced an Email Security Violation in the Last Year

The most targeted commercial model is OpenAI’s GPT-4, mainly due to its widespread adoption and cutting-edge capabilities that are enticing to attackers. Meta’s Llama-3 is among the most targeted open-source model.

The Escalation of Attacks on GenAI

“Over time, we have observed a surge in both the frequency and intricacy of [prompt injection] attacks, with adversaries deploying more sophisticated tactics and persistently striving to bypass protections,” mentioned the authors of the report.

During the onset of the AI enthusiasm, security experts cautioned that it might lead to an influx of cyber attacks in general, given that it reduces the entry barrier. Prompts can be penned in natural language, obviating the need for coding or technical proficiency to leverage them for generating malicious content.

SEE: Analysis Exposes the Influence of AI on Cyber Security Setup

Indeed, anyone can orchestrate a prompt injection assault without specialized utilities or expertise. As malicious agents hone their skills with such attacks, their frequency is anticipated to surge. Presently, such attacks are categorized as the chief security vulnerability on the OWASP Top 10 for LLM Applications.

Researchers from Pillar discovered that assaults could unfold in any language the LLM has been tutored to comprehend, rendering them internationally accessible.

Malevolent agents were observed attempting to jailbreak GenAI apps frequently, often making numerous attempts, with some leveraging specialized tools to inundate models with high volumes of attacks. Vulnerabilities were being exploited at all stages of the LLM interaction life cycle, encompassing the prompts, Retrieval-Augmented Generation, tool outputs, and model responses.

“Unchecked AI perils can yield dire repercussions for entities,” elucidated the authors. “Financial setbacks, legal entanglements, damaged reputations, and security breaches compose a few of the potential outcomes.”

The jeopardy of GenAI security breaches is anticipated to escalate as companies embrace more sophisticated models, replacing rudimentary conversational chatbots with autonomous agents. Agents extend the attack surface for malicious actors due to their enhanced capabilities and system access through the AI application,” noted the researchers.

Primary Jailbreaking Tactics

The leading jailbreaking methods employed by culprits include the Neglect Previous Instructions and Coercion-Tactic Prompt Infiltrations along with Base64 encryption.

By employing the Neglect Previous Instructions approach, the offender instructs the AI to disregard its original programming, including any barricades that prevent the generation of harmful material.

Coercion-Tactic Attacks involve inputting a series of commanding, authoritative demands such as “ADMIN OVERRIDE” that coerce the model to disregard its original programming and generate outputs that would ordinarily be blocked. For instance, it might expose confidential data or undertake unauthorized actions leading to system compromise.

Base64 encoding pertains to the encoding of malicious prompts with the Base64 encoding method by malevolent actors. This scheme can deceive the model into decoding and processing content typically blocked by its security filters, such as malicious code or instructions to extract sensitive data.

Other identified assaults encompass the Formatting Instructions technique, where the model is duped into producing restricted outputs by instructing it to format responses in a specified manner, such as utilizing code blocks. The DAN, or Do Anything Now, technique operates by prodding the model to adopt a fabricated persona that disregards all constraints.

Motives Behind Jailbreaking AI Models

The study unveiled four primary incentives for jailbreaking AI models:

Heisting sensitive data. For instance, business confidential information, user inputs, and personally identifiable data.
Producing malicious content. This could involve fabrication, hate speech, phishing communications for social engineering attacks, and malevolent code.
Degenerating AI performance. This could disrupt operations or provide the attacker with access to computational resources for illicit activities. This is attained by inundating systems with malformed or excessive inputs.
Scrutinizing the system’s vulnerabilities. Either playing the role of an “ethical hacker” or out of curiosity.

Fortifying AI Systems for Enhanced Security

According to Pillar’s specialists, bolstering system prompts and instructions alone does not suffice to entirely shield an AI model from assaults. The intricacy of language and the disparities across models offer leeway for attackers to circumvent these measures.

Therefore, entities deploying AI applications should contemplate the following for ensuring security:

Give precedence to commercial providers when integrating LLMs in critical applications, as they offer superior security attributes compared to open-source models.
Supervise prompts at the session level to unearth evolving attack trends that might not be conspicuous when analyzing individual inputs in isolation.
Conduct customized red-teaming and resilience drills, tailored to the AI application and its multi-turn interactions, to pinpoint security loopholes early on and reduce future expenditures.
Embrace security solutions that adapt instantaneously by utilizing context-aware measures that are model-neutral and align with the enterprise’s policies.

In a statement, Dor Sarig, CEO, and co-founder of Pillar Security, remarked, “As we transition towards AI agents adept in executing intricate tasks and decision-making, the security panorama becomes progressively sophisticated. Organizations must brace for a surge in AI-targeted attacks by implementing tailored red-teaming maneuvers and adopting a ‘secure by design’ tactic in their GenAI development process.”

Jason Harison, CRO at Pillar Security, added, “Static controls are no longer adequate in this dynamic AI-facilitated realm. Entities need to invest in AI security solutions capable of anticipating and countering emerging threats instantaneously, while upholding their governance and cyber guidelines.”

About Author

AndyC

Andy Curtis is an award-winning security consultant, researcher and public speaker. He has been working in the computer security industry since the early 1990s, having been employed by state and federal government, leading healthcare and banking providers across three continents. He has given talks about computer security for some of the world’s largest companies, worked with law enforcement agencies on investigations into hacking groups, and is a regular voice on TV and radio explaining IT security threats.

See author's posts

Tags: AI, artificial intelligence, attacks, breaches, cyber security, Cybersecurity, GenAI, Generative, hackers, ignore, instructed, jailbreak, models, report, Safeguards, Security, succeed, their, where

20% of Artificial Intelligence Jailbreak Incursions Succeed, With 90% Revealing Confidential Information

The Escalation of Attacks on GenAI

Primary Jailbreaking Tactics

Motives Behind Jailbreaking AI Models

Fortifying AI Systems for Enhanced Security

About Author

AndyC

Italian Ferry Malware Attack Sparks International Probe

I am not a robot: ClickFix used to deploy StealC and Qilin

Game of clones: Sophos and the MITRE ATT&CK Enterprise 2025 Evaluations

A big finish to 2025 in December’s Patch Tuesday

React2Shell flaw (CVE-2025-55182) exploited for remote code execution

I am not a robot: ClickFix used to deploy StealC and Qilin

LongNosedGoblin tries to sniff out governmental affairs in Southeast Asia and Japan

ESET Threat Report H2 2025

Black Hat Europe 2025: Was that device designed to be on the internet at all?

Black Hat Europe 2025: Was that device designed to be on the internet at all?

Black Hat Europe 2025: Was that device designed to be on the internet at all?

Black Hat Europe 2025: Was that device designed to be on the internet at all?

Donate Bitcoin to this address

Donate Ethereum to this address

NCC Group Taps Qualys to Extend Managed Security Service into Shadow IT Realm

NCC Group Taps Qualys to Extend Managed Security Service into Shadow IT Realm

NCC Group Taps Qualys to Extend Managed Security Service into Shadow IT Realm

NCC Group Taps Qualys to Extend Managed Security Service into Shadow IT Realm

NCC Group Taps Qualys to Extend Managed Security Service into Shadow IT Realm

The Escalation of Attacks on GenAI

Primary Jailbreaking Tactics

Motives Behind Jailbreaking AI Models

Fortifying AI Systems for Enhanced Security

About Author

More Stories

Donate Bitcoin to this address

Donate Ethereum to this address

You may have missed