Cloudflare provides an easier method to thwart AI bots

“The widespread use of generative AI has led to a surge in the need for data used for model training or inference, and despite some AI firms clearly labeling their web scraping bots, not all are being forthright,” stated individuals from Clou

[…Keep reading]

As per the blog post’s writers, “Allegedly, Google paid $60 million annually to obtain a license for user-generated content from Reddit, Scarlett Johansson claimed that OpenAI utilized her voice for their new virtual assistant without her permission, and most recently, Perplexity has faced accusations of posing as authentic visitors in order to extract content from websites. The significance of genuine content in large quantities has never been greater.”

Last year, Cloudflare unveiled a feature that enables any of its users, irrespective of their plan, to prevent specific categories of bots, including select AI crawlers. According to Cloudflare, these bots adhere to instructions in websites’ robots.txt files, abstain from using unauthorized content to train their models, and refrain from collecting data for retrieval-augmented generation (RAG) applications.

About Author

AndyC

Andy Curtis is an award-winning security consultant, researcher and public speaker. He has been working in the computer security industry since the early 1990s, having been employed by state and federal government, leading healthcare and banking providers across three continents. He has given talks about computer security for some of the world’s largest companies, worked with law enforcement agencies on investigations into hacking groups, and is a regular voice on TV and radio explaining IT security threats.

See author's posts