The Bot Left a Fingerprint: Detecting and Attributing LLM-Generated Passwords

The post The Bot Left a Fingerprint: Detecting and Attributing LLM-Generated Passwords appeared first on GitGuardian Blog – Take Control of Your Secrets Security.

In February 2026, researchers at Irregular published a detailed post about LLM-generated passwords. This post goes into detail on how passwords generated by LLMs follow notable patterns and are generally highly predictable.
The root cause is fundamental: LLMs are optimized to predict probable outputs, which is the exact opposite of what secure password generation demands.
That observation raised a natural follow-on question: if LLMs leave statistical fingerprints in the passwords they generate, can those fingerprints be detected and attributed? Can we look at a password found in a leaked dataset and say which model generated it? More importantly, can we measure how widely those LLM passwords are used in the wild? That is what this research set out to answer.
[embedded content]
Extending the perimeter
Irregular’s article pointed out that LLM-generated passwords are biased. They used the flagship models from OpenAI, Anthropic, and Google, and a sample size of roughly 50 passwords. We decided to extend the scope of the analysis to 40 LLM models from 11 providers, including both closed-source (OpenAI GPT, Anthropic Claude, etc) and open-source (Qwen, DeepSeed, etc) models. We also decided to increase the password sample size to 200 to improve the statistical accuracy of the analysis.
Therefore, we generated a total dataset of 8,000 passwords.
An initial analysis of this data confirmed Irregular’s original analysis. We observe a bias in the generated passwords. The bias is inconsistent across the models, with some showing a very low number of distinct passwords while others don’t:

Anthropic’s models show poor uniqueness: Claude Opus 4.6 is the worst, with only 35% of unique passwords.
The open-source Qwen, Llama, and Gemma models show between 50 and 60% uniqueness.
The GPT-5 family generates only unique passwords.

The uniqueness of generated passwords does not guarantee their security. In practice, as the original article shows, generated passwords tend to all follow a similar pattern and use common substrings.
In fact, nearly all models follow the same “upper, digit, symbol, lower” pattern repetition, with some slight variations:

Anthropic models lock position 0 firmly: claude-opus always starts lowercase (100%), claude-haiku and claude-sonnet-4.6 always start uppercase (100%).
Llama models are 99–100% uppercase at position 0.
GPT-4.1-mini is 92% uppercase at position 0.

Likewise, all models exhibit a strong statistical deviation from a random password distribution. This is better illustrated by observing the most common substrings per model (between parenthesis: the difference factor compared to a random distribution):

gpt-5.2 generated the 7! bigram in 52% of passwords (x4.5k) and the vQ7!mZ substring in 6% of them (x41B)
Mistral-medium-3.1 generated the x7#pL9 substring in 65% of passwords (x448B)
Llama-3.3-70b-instruct generated the 8d bi-gram in all passwords, and the Gx#8dL substring in 96% of them, the worst score of all models.

Interestingly, the analysis of common substrings shows that some of them are shared across multiple providers:

The simple L2 bigram is found in the passwords of 10 out of the 11 providers, with an average probability of appearing at 27% (x114)

The longer #kL9 substring is found in the passwords of 4 providers (mistralai, deepseek, qwen, and openai) with an average probability of 13% (x954M)
Fighting robots with a rusty sword
The previous results suggested that modeling the LLM-generated password could be done using Markov chains.
A Markov chain is a mathematical model that describes a sequence of events where the probability of each event depends only on the state of the previous one. They were first introduced by Russian mathematician Andrei Markov in 1906, 100 years before LLMs.
Since Markov’s original work, the model has found applications across a remarkable range of fields, including text generation. When used as a next-character or next-word prediction engine, Markov chains can be seen as the ancestor of LLMs.
For password prediction or recognition, a Markov chain can be as simple as:

One state for each letter of the alphabet
Transitions set to the probability of encountering a character after the current state’s

A Markov chain trained with the passwords: PASS, P@SS, PA$$, etc
Without entering too much into the technical details, we used the sample of LLM-generated passwords to build multiple different Markov chains:

One chain per selected model.
One chain per model family or provider.
One chain that aggregates the whole LLM password dataset.

To verify the validity of this approach and that the chains correctly capture the statistical bias of the LLMs when generating passwords, we scored a second dataset of LLM-generated passwords. We compared the results with a random baseline and the scores of a dataset of generic passwords.
What we found is that:

The chains identify the right model in 55% of cases and the correct provider in 65% of cases.
The generic chain trained on the whole dataset was, on average, half as surprised when seeing an LLM-generated password as when seeing a random value or a generic password.

Due to the common patterns between models and providers, we observed that the model and prediction often bleed into similar models or providers. This does not reduce the ability of those chains to identify LLM-generated passwords in the general sense.
Hunting bot passwords in the wild
Given the good performance of Markov chains in classifying passwords, we decided to classify a sample of passwords collected in the wild by GitGuardian’s public monitoring platform. We selected a dataset of 34M passwords observed on GitHub between November 2025 and March 2026. We then checked every one of those passwords against the previously built chains.
Markov chains provide statistical measures. In a conservative approach, we decided that a password would be considered LLM-Generated if:

A model-specific chain predicts it with >75% confidence.
A provider-specific chain predicts it with >75% confidence.
The general chain sees the password with a perplexity level <100

In addition, we decided to exclude xAI models from this study because they often generate non-random-looking passwords (e.g., P@ssw0rdS3cur3!2023, SecureKey789!). This results in the corresponding chain capturing weak human-generated passwords.
With this method, we classified 28,000 passwords as LLM-generated. The most predicted providers are, by far, Anthropic, Qwen, and Google. The three of them represent 63% of all occurrences.

Provider

Count

Anthropic

7951

Qwen

6643

Google

3184

OpenAI

2812

Amazon

2661

Mistralai

1710

Meta Llama

1498

Cohere

1405

Deepseek

182

Microsoft

The Anthropic passwords are also the most certain candidates, with an average confidence level of 92%.
Looking at the commit dates, we can see that LLM-generated passwords have been committed consistently at an average rate of 1,500 per week during the study timeframe.
The number of password committed per week averages 1,500.
The passwords are mostly contained in JSON files, but a significant proportion are hardcoded in source and configuration files. Especially, 1,800 .env files have been found to contain at least one secret generated by an LLM. This includes application security keys, encryption keys, and passwords for third-party services.

# Company AWS Services Configuration
USE_AWS_SERVICES=true
AWS_REGION=us-east-1

# S3 Buckets
S3_BUCKET= company-docs-production-947514525
KNOWLEDGE_BASE_BUCKET=company-kb-production-947514525

# Lambda Functions
COMPANY_PROCESSOR_FUNCTION=company-processor-production

# Database
DATABASE_PASSWORD=Kx9mP2vQ8nR5tY7w
A typical LLM-generated password used to connect to a database.

“lx01” = {
name = “lx01”
image_id = “”
image_name_regex = “ubuntu_22_04”
cpu = 1
memory_gb = 2
private_ip_address = “{{ip_range}}.40”
password = “x7QpL2n9V8F5”
}
A typical LLM-generated password is used as the default password of Terraform-generated machines.

The most frequent extensions includes configuration and source files.
It is also worth mentioning that, among the 8,700 commits that contain a predicted Anthropic-generated password, 41 are announced as co-authored by Claude. While this number may seem low, it is important to remember that the co-author banner is optional and only added by Claude when it is allowed to commit alone. As showcased by the related data from the state of secret sprawl report, the co-authored commits are not representative of all the code written by Claude code.
Even if the number is low, it validates our initial observation that AI Agents can independently generate and hardcode passwords in code, and likely other places.
What this means in practice
When compared with the size of the generic passwords dataset we classified, the number of passwords classified as LLM-generated is low. At that point, we can not say the prevalence of those passwords is significant. The LLM-generated passwords issue is not widespread.
That said, we have still observed interesting behaviors worth noting.
Some people are using LLMs as password generators
We have observed LLM-Generated passwords used in connection strings to web or database services. Unless the coding agent configured those services, that means a user purposely defined that password and asked an AI to generate it.
Obviously, asking an AI agent to generate a password is a bad practice. The password will have transited through the network, the LLM provider will know it, and it might end up logged on the endpoint in an agent log file. This is a password being leaked before even reaching the developer’s machine.
In that case, apart from this already bad leak, we’ve seen that those passwords will end up weak and mostly predictable.
AI Agents autonomously generate and hardcode passwords
We have observed LLM-generated passwords hardcoded in Terraform files and defined in commits authored by an AI agent. While not widespread (when compared to the total number of leaked passwords), this behavior exists. In some cases, having an AI agent generate and hardcode a secret in code can lead to severe issues. Even if the code is not publicly leaked, the password predictability can enable efficient online enumeration.
This risk should be taken into account when designing your AI security policy.
It is also worth noting that some of the passwords in that category were generated using flagship models. The issue is therefore still relevant and not just tied to older or entry-level models.
Attacking and defending
Using Markov chains to classify LLM-generated passwords enabled us to identify them in the wild. However, Markov chains are primarily known for generating content rather than classifying it. For that reason, those chains could also be used to attack LLM-generated passwords with a much higher efficiency than a brute-force approach. Common password-cracking tools already implement a Markov mode that could be used for that purpose.

$ calc_stat opus_46.passwords opus_46.john.stats

$ vim john.conf
[Markov:opus46]
Statsfile = opus_46.john.stats
MkvLvl = 400
MkvMaxLen = 20
MkvMinLen = 16

$ john –markov=opus46 –stdout
Press ‘q’ or Ctrl-C to abort, ‘h’ for help, almost any other key for status
MKV start (stats=opus_46.john.stats, lvl=200 len=23 pwd=455419097)
k7$Lm9#Qx2vP!nW4rT&jR&j
k7$Lm9#Qx2vP!nW4rT&jR&8
k7$Lm9#Qx2vP!nW4rT&jR&4
[…]
John The Ripper generating Opus 4.6-looking passwords.

On the defender side, obviously, LLMs should not be used as password generators. A vault, password manager, or similar tech should generate all passwords.
The agent-generated password case is less obvious to solve and requires tight control over what AI agents produce. This can only be achieved by setting up appropriate guardrails inside your AI agent workflow. Luckily, GitGuard’s ggshield tool can help with that. Indeed, recently added features allow developers to scan AI agent hook events for secrets. Installing this capability is as simple as running a ggshield command.

$ ggshield install -t cursor -m global
$ ggshield install -t claude -m global
Setting up ggshield to scan Claude and Cursor hook events for secrets.

It’s a small step, but given what’s at stake, it’s one no team shipping AI-assisted code can afford to skip.

*** This is a Security Bloggers Network syndicated blog from GitGuardian Blog – Take Control of Your Secrets Security authored by Gaëtan Ferry. Read the original post at: https://blog.gitguardian.com/the-bot-fingerprint-detecting-llm-passwords/

About Author

AndyC

Andy Curtis is an award-winning security consultant, researcher and public speaker. He has been working in the computer security industry since the early 1990s, having been employed by state and federal government, leading healthcare and banking providers across three continents. He has given talks about computer security for some of the world’s largest companies, worked with law enforcement agencies on investigations into hacking groups, and is a regular voice on TV and radio explaining IT security threats.

See author's posts

Tags: “the, appeared, attributing, detecting, Fingerprint, first, left, LLMGenerated, passwords, post, Security Bloggers Network, Security Research

The Bot Left a Fingerprint: Detecting and Attributing LLM-Generated Passwords

About Author

AndyC

What do you feel about this?

[un]prompted 2026 – Detection & Deception Engineering In The Matrix