The AI Agent Identity Crisis: 80% of Agents Don’t Properly Identify Themselves, 80% of Sites Don’t Verify
AI agents are reshaping product discovery and commerce, but there’s a fundamental problem with AI agent identity: most agents don’t prove who they are, and most websites don’t bother checking.
The AI Agent Identity Crisis: 80% of Agents Don’t Properly Identify Themselves, 80% of Sites Don’t Verify
AI agents are reshaping product discovery and commerce, but there’s a fundamental problem with AI agent identity: most agents don’t prove who they are, and most websites don’t bother checking. 38% of consumers now use tools like ChatGPT for product research, and 21% have used agents that make decisions or purchases automatically. This surge in agent traffic exposes a broken trust model on both sides of the equation—how agents identify themselves, and how websites verify those identities.
DataDome’s Galileo threat research team analyzed both problems. The findings—included in the Future of Search and Discovery Report in collaboration with AWS, Botify, and Retail Economics—reveal why fraudsters find AI agent impersonation so effective: the system is designed to be spoofed.
80% of AI agents don’t properly identify themselves
When an AI agent visits a website, it should declare itself through signals that can’t be easily faked. Proper identification methods include published IP range lists, reverse DNS lookups, or authentication protocols like Web Bot Auth. These provide cryptographic proof of identity.
But 80% of AI agents don’t use these methods. Instead, they rely on user-agent strings—HTTP headers that declare “I’m ChatGPT” or “I’m Perplexity”—which anyone can spoof by copying a single line of code. Some agents publish IP lists, but leave them incomplete or unmaintained, creating gaps that attackers exploit.
Why do legitimate agents do this? Sometimes they run client-side and inherit the user’s IP address. Sometimes they use shared infrastructure that makes verification difficult. The reason doesn’t matter to security teams facing the practical reality: you can’t distinguish an AI agent helping a customer from a scraper stealing your catalog.
This creates two immediate problems. First, fraudsters can clone poorly declared AI agents and inherit whatever trust or access privileges websites grant them. Second, website owners can’t make informed decisions about their traffic because they can’t tell which “AI referrals” are legitimate and which are manufactured.
80% of websites are not protected against AI agent spoofing
To understand how websites handle unverifiable agent traffic, DataDome tested 698,214 reachable sites using a spoofed ChatGPT-style user-agent—“Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko); compatible; ChatGPT-User/1.0; +https://openai.com/bot“—and measured whether sites blocked, challenged, or allowed it through. Notably, ChatGPT-User is one of the AI agents that does provide publicly documented IP ranges, making its traffic verifiable. Sites had the tools to catch this impersonation—they just didn’t use them.
79.7% of websites allowed the request through without blocking or challenging it. Only 17.8% stopped the spoofed agent. The takeaway: nearly 80% of websites can’t tell a legitimate AI agent from an attacker pretending to be one.
Most websites treat user-agent strings as if they’re verified credentials. They’re not. They’re self-reported claims with no authentication layer—the digital equivalent of accepting someone’s word that they work for a trusted company without checking their ID.
This verification gap matters because attackers understand how defenses work. Many security systems apply different rules based on traffic source. “Good AI” traffic often receives lighter scrutiny, more permissive rate limits, and exemptions from challenges that would stop suspicious behavior.
Spoofing a reputable AI agent works like a hall pass. It helps unauthorized automation slip through defenses designed to catch it. Aggressive scrapers steal product catalogs and pricing data. Account takeover attempts probe authentication endpoints. Fraud operations test stolen payment credentials. All while pretending to be ChatGPT or Claude.
This isn’t theoretical. DataDome’s Galileo threat research team has observed attackers systematically abusing AI agent infrastructure to bypass traditional security controls. In documented cases, ChatGPT’s infrastructure was used to perform an SQL injection attack, Perplexity for reflected XSS, and Meta’s crawler as a vulnerability scanner. When websites can’t verify agent identity, attackers inherit whatever trust those agents receive and use it to probe for weaknesses.
Why this creates both security and measurement problems
The same weakness that enables fraud also pollutes analytics. If attackers can manufacture “AI referral traffic” by spoofing user-agent strings, your data lies to you. Traffic attribution breaks down. You can’t measure AI’s real impact on conversions, so you can’t optimize for the agentic commerce shift that’s already reshaping product discovery.
DataDome’s Galileo team recently documented this exact scenario: a surge in “ChatGPT referral traffic” that turned out to be aggressive scrapers. What looked like hundreds of thousands of legitimate ChatGPT users was actually bots making 22 requests per second, routing through residential proxies to appear authentic. One bot session hit 109 product pages in five seconds. The spoofing required a single line of code.
Decisions about where to invest in AI optimization, which products to feature, and how to structure data for agent consumption all depend on knowing which agent traffic is legitimate. Without verification, you’re optimizing for noise while fraudsters operate in the signal.
How to fix the broken trust model
As AI agents become a primary channel for product discovery, verification becomes foundational. Authentication protocols establish cryptographic proof of identity using signatures that only the legitimate provider can generate. Protocols like Web Bot Auth, Visa TAP, and Mastercard Agent Pay make impersonation mathematically difficult instead of trivially easy.
But authentication alone isn’t enough. Compromised agents with valid credentials will pass every identity check. That’s where behavioral analysis matters. Real-time intent detection catches when a verified agent starts acting maliciously: unusual request patterns, suspicious data extraction, or signs that it’s working as part of a coordinated attack.
DataDome operates at both layers. We verify agent identities at the edge through authentication protocols and published infrastructure, then analyze behavior in real time to detect when verified agents turn hostile. That’s how you capture the revenue opportunity of agentic commerce without the security and fraud exposure that comes with unverifiable traffic.
The shift to AI-driven discovery is happening whether your business is ready or not. The question isn’t whether to allow AI agents—it’s whether you can verify which ones are legitimate before they access your systems.
Ready to test your defenses? Run DataDome’s Vulnerability Scan to see if a spoofed AI agent can access your website, or download the full Future of Search and Discovery Report for deeper insights on agentic commerce.
*** This is a Security Bloggers Network syndicated blog from DataDome authored by Jérôme Segura. Read the original post at: https://datadome.co/threat-research/ai-agent-identity-crisis/
