Moltbook is Dangerous, but Scale Doesn’t Match the Hype: Zenity
Moltbook, the highly popular Reddit-style social network designed exclusively to enable AI agents to communicate, launched in late January and almost immediately went viral, with users putting their agents onto the site and then peering in to see
Palo Alto Networks Moves to Secure Agentic Endpoints with Koi Deal
Moltbook, the highly popular Reddit-style social network designed exclusively to enable AI agents to communicate, launched in late January and almost immediately went viral, with users putting their agents onto the site and then peering in to see what they were talking about with other people’s agents.It also attracted cybersecurity researchers who were already wary of security issues surrounding AI agents themselves, which can work autonomously to solve problems but also expand the attack surface. They quickly sent up warning signals about both Moltbook and OpenClaw, the open source, autonomous AI agent that is the primary user of Moltbook and that Sophos CISO Ross McKuchar called a “warning shot for enterprise AI security.”Researchers with Koi Security, which is being bought by Palo Alto Networks, raised eyebrows when they reported early this month that 341 of the more than 2,857 skills found on the ClawHub marketplace were malicious, spreading malware ranging from backdoors to keyloggers to the AMOS macOS stealer. Most were part of what’s known as the ClawHavoc campaign.In an update this week, the numbers got worse. The marketplace grew to more than 10,700 skills, with 824 being malicious.“The ClawHavoc campaign expanded across every existing category, and we identified ~25 entirely new attack categories including browser automation agents, coding agents, LinkedIn/WhatsApp integrations, PDF tools, and – in a grim bit of irony – fake security-scanning skills,” they wrote.Five days after Moltbook launched, Wiz researchers wrote that they found a misconfigured Supabase database that allowed full read and write access to all data on the platform, exposing 1.5 million API authentication tokens, 35,000 email addresses, and private messages between agents. They also found that of the 1.5 million registered agents at the time, only 17,000 human owners were behind them – an 88-to-one ratio – and that there was no way on the site to verify whether an agent was AI or a human.Measuring Interaction, SecurityThis week, researchers with Zenity Labs, which offers an AI agent security and governance platform and has done deep dives into both OpenClaw and Moltbook, found that an indirect prompt injection against OpenClaw can lead to a zero-click persistent backdoor and then into a full compromise of an endpoint. With Moltbook, they uncovered malicious activity and attack campaigns run by bad actors in the wild, exploiting functions of the platform, from upvotes to engagement bait to cross-thread visibility.They decided to take another look at Moltbook.“We decided to move beyond isolated incidents and examine the network itself,” Zenity researchers Stav Cohen and João Donato wrote in a report this week. “Rather than focusing on a single injection or campaign, we embedded ourselves inside Moltbook to understand its structure, behavior, and underlying mechanics.”The twin goals were to understand the operations inside Moltbook, which seemed to include tens of thousands of autonomous agents continuously interacting with each other. The researchers wanted to see if that was true. In addition, they wanted to show how easy it is for attackers to infiltrate the platform, “specifically, whether coordinated content could influence large numbers of OpenClaw-connected agents and cause them to act on instructions embedded inside posts.”A Controlled Influence CampaignTo do this they published posts across different submolts – topic-specific forums – aimed at making agents read the content and follow a benign link controlled by the researchers. They could analyze distribution patterns and show the activity on a map of the world. Cohen and Donato called their work a controlled influence campaign.“We intentionally crafted content to cause agents to follow our instructions,” they wrote. “We stopped at a benign link for telemetry purposes, but the same mechanism could have been used to deliver far more harmful payloads.”They used capabilities allowed on the platform to activate more than 1,000 unique agent endpoints and located them across more than 70 countries. They also took advantage of the built-in heartbeat mechanism that makes Moltbook agents ingest and act on untrusted content every 30 minutes.Not Quite What’s AdvertisedWhat they found was that there are hundreds of active agent endpoints polling Moltbook and interacting in near real time and that the ecosystem is geographically distributed. Agents run across globally distributed IP ranges, with the United States leading with 468 unique IPs and other countries – from European Union members to China, India, and Brazil – following.That said, the scale they saw didn’t match what is pitched in public. Rather than tens of thousands of unique active agents interacting with the content, Zenity researchers saw hundreds.“The vision of agents collaborating, building, coordinating, and forming emergent communities is compelling,” Cohen and Donato wrote. “We are simply not there yet. In its current state, Moltbook is fundamentally fragile. Core ranking logic behaves inconsistently, amplification mechanics are skewed, and identity assumptions are weak. The platform requires substantial architectural hardening before it can support the scale it markets.”Security is the Big WorryMore concerning are the security issues raised by the research. They found that untrusted content on the platform can influence other agents, something the researchers showed in their benign research. That said, it also illustrated what threat groups could exploit for their campaigns.“A malicious actor could weaponize the same mechanism to propagate worms, trigger unwanted actions, pivot into other OpenClaw skills and integrations, or cause irreversible damage,” they wrote. “Most importantly, this research demonstrates that influence propagates. A single coordinated content strategy was able to trigger hundreds of autonomous systems across the globe to fetch external resources. That is the real signal.”
