Exposed LLM Infrastructure: How Attackers Find and Exploit Misconfigured AI Deployments

Someone is scanning your LLM infrastructure right now. They are not waiting for you to finish your security review.

[…Keep reading]

Exposed LLM Infrastructure: How Attackers Find and Exploit Misconfigured AI Deployments

Exposed LLM Infrastructure: How Attackers Find and Exploit Misconfigured AI Deployments

Someone is scanning your LLM infrastructure right now. They are not waiting for you to finish your security review.
Between October 2025 and January 2026, GreyNoise’s honeypot infrastructure captured 91,403 attack sessions targeting exposed LLM endpoints. These were two distinct campaigns systematically mapping the expanding attack surface of misconfigured AI deployments.
Your team is moving fast on AI. LLM servers are going live, inference APIs are being connected, MCP endpoints are being spun up. Most of it is happening without the same security controls you would apply to any other piece of internet-facing infrastructure. The assumption is that these are internal tools.
The entry points attackers are using are basic misconfigurations your team made in a hurry to get AI into production: exposed ports, unauthenticated APIs, MCP servers with no access controls.
This blog breaks down exactly how it happens and what your infrastructure looks like from an attacker’s perspective before you realize it is exposed.
How Do Attackers Find Exposed LLM Servers?
Before any exploitation happens, attackers do reconnaissance. For self-hosted LLM infrastructure, that reconnaissance is trivially easy. Attackers use internet scanning tools to find endpoints. Once an endpoint appears in scan results, exploitation attempts begin within minutes.
The reason discovery is so easy comes down to defaults. Default ports such as port 11434 for Ollama make fingerprinting trivial. Most teams deploy inference servers using out-of-the-box configurations, which means the service announces itself to anyone scanning the internet.
A recent investigation conducted over 293 days and covering 7.23 million observations, found 175,000 unique Ollama hosts publicly accessible across 130 countries.
These are not obscure research deployments. They are production AI servers belonging to real organizations. Exposed because someone opened a port for convenience and never closed it. Or because an AI project moved from a developer laptop to a cloud instance without a security review. For a deeper look at how attackers interact with exposed Ollama servers and what gets compromised, read our dedicated guide: Exposed Ollama Servers: LLM Infrastructure Security Risks.
What Happens When Attackers Access Your LLM Infrastructure?
Finding an exposed endpoint is the beginning of a decision tree. Depending on what the endpoint exposes, an attacker has several options available simultaneously, and in many documented cases, they pursue more than one.
Compute Abuse
Running high-volume inference requests or long-form generation tasks that consume GPU resources and drive-up infrastructure costs without triggering any security alert. Attackers just need access to cause damage here. A single exposed endpoint can be used to run thousands of inference requests for generating content, processing data, or powering other attacks while your team sees nothing unusual in application logs.
Data Exfiltration
LLM context windows often contain sensitive organizational information, including customer records, internal documents, API credentials, source code references, depending on what the application feeds into the model. A single well-crafted prompt to an exposed endpoint can pull that context out. This is particularly dangerous in RAG-based deployments where the model is actively pulling from internal knowledge bases, databases, or document stores to answer queries. The attacker does not need to access those systems directly. They just need to ask the model the right question.
Lateral Movement
Connected tool-calling capabilities, MCP servers, and webhook integrations give an attacker a path from the LLM layer into broader internal infrastructure that were never meant to be reachable from outside. An AI agent with access to your internal tools is a powerful asset for your team. In an attacker’s hands, it is a pivot point. Once inside the LLM layer, the attacker can instruct the model to execute commands, query databases, read files, and interact with connected services, all through the same interfaces your developers built for legitimate use.
Monetization
Selling validated endpoint credentials through underground marketplaces to buyers who then use your compute at your expense. Operation Bizarre Bazaar documented exactly this: a three-stage criminal supply chain scanning for exposed endpoints, validating access, and reselling it through a commercial marketplace at 40 to 60 percent discounts. Your infrastructure becomes someone else’s revenue stream while you absorb the cost.
LLM Infrastructure Attacks in the Wild: Real Incidents from 2025 and 2026

Incident
What Happened

Operation Bizarre Bazaar
Between December 2025 and January 2026, threat actor Hecker systematically scanned for exposed LLM and MCP endpoints, validated access, and resold it through silver.inc, a marketplace advertising access to 30+ LLM providers at 40–60% discounts via Telegram and Discord.

GreyNoise Campaign 2
Starting December 28, 2025, two IPs with a combined history of 4 million sensor hits and 200+ CVE exploitations launched an 11-day probe of 73+ LLM endpoints generating 80,469 sessions, assessed as a professional threat actor building target lists for future exploitation.

Christmas Spike
A separate SSRF campaign spiked to 1,688 sessions in 48 hours over Christmas 2025, exploiting Ollama’s model pull functionality to force servers to connect to attacker-controlled infrastructure.

Ollama Unauthorized Access
NSFOCUS detected active exploitation of Ollama deployments exposed without authentication (CNVD-2025-04094), allowing attackers to steal model assets, feed false information, and abuse compute resources.

MCP Reconnaissance Surge
By late January 2026, 60% of all attack traffic had shifted from LLM API abuse to MCP endpoint reconnaissance, indicating attackers were mapping pathways into internal infrastructure.

How Attackers Exploit Exposed LLM Infrastructure: Four Active Attack Vectors
The following four attack vectors are based on documented campaigns, active CVEs, and confirmed exploitation patterns observed between 2025 and 2026.
Attack Vector 1: Unauthenticated Inference APIs
The most common and most exploited misconfiguration is a self-hosted LLM inference API exposed to the internet without authentication.
Because Ollama does not have authentication and access control functions by default, opening the service to the public network creates an immediate risk. An unauthenticated attacker can directly call its API interface to steal sensitive model assets, feed false information, tamper with system configuration, or abuse model reasoning resources.
Once an attacker has unauthenticated access, the attack surface opens significantly. They can identify which models are installed, submit arbitrary prompts, consume compute resources, probe the system for internal information, and use tool-calling capabilities to reach connected systems and APIs.
Most commonly exploited configurations:

Configuration
Default Port
Risk

Ollama without authentication
11434
Unauthenticated inference, model theft, compute abuse

OpenAI-compatible API exposed
8000
Full API access without credentials

MCP server without access controls
Various
Lateral movement into internal systems

Production chatbot without rate limiting
80 / 443
Abuse and data extraction at scale

Attack Vector 2: SSRF Against Self-Hosted AI Server Infrastructure
The second major attack vector does not require direct access to the model. It uses your AI server itself as a tool to reach your internal infrastructure.
One active campaign exploited server-side request forgery vulnerabilities that force a server to connect to attacker-controlled external infrastructure. Attackers abused Ollama’s model pull functionality to inject malicious registry URLs, forcing the server to make outbound HTTP requests without the owner’s knowledge.
This is dangerous for two reasons. First, it allows attackers to confirm which internal services are reachable from your AI server. Second, it creates a channel for data exfiltration that bypasses traditional perimeter controls, the requests appear to originate from a trusted internal system.
How this attack flows:
Attacker sends crafted request to Ollama API↓Injects malicious registry URL into model pull request↓Ollama server makes outbound HTTP request to attacker infrastructure↓Attacker receives callback confirming SSRF success↓Internal network topology now mapped from the AI server outwardAs organizations increasingly deploy AI systems in production, these services are often exposed through APIs, webhooks, or proxy layers, creating new opportunities for attackers to probe for misconfigurations and abuse.
SSRF through self-hosted AI infrastructure is particularly difficult to detect because the outbound request looks like legitimate broker activity. Behavioral monitoring that baselines normal outbound traffic from your AI servers and alerts on deviations is the control that catches this.
Attack Vector 3: Misconfigured Proxy Layers in Your Self-Hosted AI Stack
Many organizations run a proxy layer that connects their internal applications to their self-hosted LLM deployment or routes requests between internal AI services. If that proxy layer is misconfigured, attackers do not need to find your inference server directly, they can reach it through the proxy.
Starting December 28, 2025, two IP addresses generated 80,469 sessions over eleven days, probing more than 73 LLM model endpoints and hunting for misconfigured proxy servers that could provide unauthorized access to AI infrastructure. The attack tested both OpenAI-compatible API formats and Google Gemini formats across every major model family.
The reconnaissance was deliberately designed to avoid detection. Test queries stayed deliberately innocuous, using phrases designed to fingerprint which model actually responds without triggering security alerts.
The goal is to build a map of which proxy layers are accessible and then exploit them directly. By the time your team notices something is wrong, the access has often been active for weeks.
Rate limiting, request pattern analysis, and anomaly detection at the proxy layer are the controls that catch this before the damage is done.
Attack Vector 4: MCP Server Exploitation
Model Context Protocol servers represent the newest and least secured entry point in self-hosted LLM infrastructure. MCP servers connect your AI agents to external tools, APIs, file systems, and databases. They are designed to extend what an AI agent can do and that is exactly what makes them dangerous when exposed without controls.
A single exposed MCP endpoint can form a bridge to your entire internal infrastructure. Exposed MCP servers become entry points for attackers to navigate file systems, query databases, and access cloud APIs. Beyond unauthorized access, MCP servers also carry direct vulnerability risk.
CVE-2025-15063 — Ollama MCP Server RCE

Detail
Value

Vulnerability type
Command injection — CWE-78

Authentication required
None

Attack vector
Network

Impact
Arbitrary code execution on host system

Affected component
execAsync method in Ollama MCP Server

By late January 2026, 60% of all attack traffic documented by Pillar Security had shifted specifically to MCP endpoint reconnaissance, indicating attackers were mapping pathways into your internal infrastructure.
MCP is new enough that most security teams have not built detection or access controls around it yet.
Why LLM Infrastructure Security Keeps Failing
The root cause across all four attack vectors is the same. It is a fundamentals problem: authentication skipped, ports left open, security teams informed too late. AI moved faster than the controls around it.
The organizations getting hit are teams that moved fast on AI, made reasonable shortcuts under time pressure, and never came back to close the gaps. The deployment profiles that carry the highest exposure risk right now:

What Teams Do
What Should Happen

Bind Ollama to 0.0.0.0 for convenience
Bind to localhost, use reverse proxy for remote access

Deploy AI prototypes without decommissioning
Audit and remove unused AI services from public exposure

Move AI from laptop to cloud without security review
Apply the same controls as any internet-facing service

Assume internal tools are not discoverable
Treat any cloud-hosted service as internet-facing by default

Skip authentication to reduce friction
Implement API key or OAuth 2.0 before any public exposure

Deploy MCP servers without access controls
Enforce allow and deny rules on every MCP endpoint

If you recognize your environment in any of these, the threat is already in your attack surface.
How to Secure Your LLM Infrastructure Against Attacks
The controls are the same fundamentals that apply to any internet-facing infrastructure, applied to an environment where they are currently being skipped at scale.
Discover what you have exposed before attackers do. Run an external attack surface scan on your AI infrastructure specifically. If you are not doing this continuously, you have a blind spot that grows every time a new AI service goes live. AppTrana’s AI Server Discovery automates this as an ongoing capability, surfacing exposed Ollama servers, inference APIs, and MCP endpoints, including those behind reverse proxies before they appear on a scanner’s results page.
Close unauthenticated access. No inference API, MCP server, or internal proxy layer should be reachable from the internet without authentication. Implement API key validation or OAuth 2.0 in front of every LLM endpoint. If a service cannot be authenticated, it should not be internet-facing.
Restrict outbound connections from AI servers. SSRF attacks depend on your AI server being able to make outbound requests to attacker-controlled infrastructure. Apply egress filtering and allow outbound connections only to approved addresses. Your self-hosted inference server has no reason to call arbitrary external URLs.
Monitor for behavioral anomalies. High token consumption from unknown source IPs, requests to models you have not loaded, cross-model probing patterns, and outbound connections following API activity are all signals worth alerting on. Behavioral baselines catch what signature-based controls miss, including the deliberate low-noise probing that characterized the December 2025 campaign.
Apply network segmentation. Self-hosted AI inference servers should not be directly reachable from the internet. Use VPCs, internal load balancers, and VPN-based access for remote inference needs rather than public exposure. Treat your LLM infrastructure the same way you treat your database layer.
Govern MCP endpoints explicitly. Every MCP server should have explicit allow and deny rules controlling what AI agents can do. An MCP endpoint with unrestricted access to file systems, databases, and cloud APIs is not a convenience feature, it is an attack surface.
AppTrana AI Shield: AI Firewall Built for the LLM Layer
AppTrana AI Shield is a fully managed AI firewall built specifically for LLM-powered applications: chatbots, copilots, inference APIs, and agent frameworks. It sits inline between your users and your models, inspecting every prompt before it reaches the LLM and every response before it reaches the user.
Every prompt is checked against your policy before the model processes it. Every response is scanned for sensitive data, including PII, credentials, internal identifiers and blocked or redacted at the output layer in real time. Jailbreaks and prompt injection attempts are detected behaviourally, catching structural attack patterns regardless of how they are phrased, including novel variations that signature-based tools have never encountered.
At the infrastructure layer, AI Shield stops automated LLM abuse before it reaches your inference compute. Prompt storms, systematic knowledge base scraping, and brute-force model extraction are handled by integrated bot detection at the edge. Coverage maps to the OWASP LLM Top 10 across every major AI risk category: prompt injection, insecure output handling, sensitive information disclosure, model denial of service, giving your compliance team a structured framework, not just firewall logs.
AI Shield is model-agnostic and requires no architecture changes. It protects public APIs, private cloud deployments, and on-premises inference servers through your existing AppTrana workflows. Indusface engineers design and tune your policies based on your specific use cases and risk profile. The 24×7 SOC monitors for anomalies and responds, including token consumption spikes, cross-model probing, and policy violation patterns, without waiting for your team to triage an alert.
See AppTrana AI Shield in Action →
Stay tuned for more relevant and interesting security articles. Follow Indusface on Facebook, Twitter, and LinkedIn.

The post Exposed LLM Infrastructure: How Attackers Find and Exploit Misconfigured AI Deployments appeared first on Indusface.

*** This is a Security Bloggers Network syndicated blog from Indusface authored by Aayush Vishnoi. Read the original post at: https://www.indusface.com/blog/exposed-llm-infrastructure-risks/

About Author

What do you feel about this?

Subscribe To InfoSec Today News

You have successfully subscribed to the newsletter

There was an error while trying to send your request. Please try again.

World Wide Crypto will use the information you provide on this form to be in touch with you and to provide updates and marketing.