Meet Vespasian. It Sees What Static Analysis Can’t.


Praetorian is excited to announce the release of Vespasian, a probabilistic API endpoint discovery, enumeration, and analysis tool.

[…Keep reading]

Meet Vespasian. It Sees What Static Analysis Can’t.

Meet Vespasian. It Sees What Static Analysis Can’t.


Praetorian is excited to announce the release of Vespasian, a probabilistic API endpoint discovery, enumeration, and analysis tool. Vespasian watches real HTTP traffic from a headless browser or your existing proxy captures and turns it into API specifications (OpenAPI, GraphQL SDL, WSDL). We built it because pentesters spend the first days of every API engagement manually reconstructing documentation that should already exist.

You know the scenario. You are three days into an API penetration test. Documentation was promised during scoping, and it existed at some point, but the Confluence page was last updated eighteen months ago and describes endpoints that have since been replaced. The Swagger UI returns a 404. The mobile app calls endpoints that don’t appear in any documentation at all. Nobody dropped the ball; the API just evolved faster than the docs.

So you do what every pentester does: you open Burp Suite, click through the application for an hour, and start reading raw HTTP traffic. You spot JSON responses on /api/v2/ paths. GraphQL queries appear on a different subdomain. There’s a SOAP service that the frontend calls exactly once during login. Endpoint URLs are copied into a spreadsheet. You guess at parameter names. You manually reconstruct the API over the course of a couple days.

This part of the project is informative, but it’s also a bottleneck. Vespasian reduces that bottleneck. It observes real HTTP traffic, either by crawling the target with a headless browser or by importing captures you’ve already made in Burp Suite, HAR, or mitmproxy, and generates API specifications automatically. REST endpoints become OpenAPI 3.0. GraphQL endpoints become SDL schemas. SOAP services become WSDL documents. You can try it yourself at github.com/praetorian-inc/vespasian.

Both Vespasian’s crawl and analysis are inherently probabilistic processes. No headless browser can guarantee 100% coverage of API endpoints that are hidden or require an improbable sequence of API requests as a prerequisite. Likewise, we can at most infer parameter types given a finite observation set. However, we’ve implemented Vespasian to be thorough and careful, and we’ll keep making improvements on a continuous basis.

Why Traditional API Discovery Misses Modern APIs

The standard approach to API discovery during penetration tests is some combination of checking known paths (/swagger.json, /openapi.yaml, /.well-known/openapi), reading source code for endpoint definitions, and manually proxying traffic through Burp or similar tools.

Each approach has a blind spot.

Checking known paths only finds APIs that are explicitly documented and served at conventional locations. If the development team never published a spec (or published one and then let it drift) you get nothing, or worse, you get a spec that doesn’t match reality.

Static analysis and source code review can identify endpoint definitions in server-side code, but they cannot observe what actually happens at runtime. Modern single-page applications built with React, Angular, and Vue construct API requests dynamically in JavaScript. The frontend assembles URLs from configuration objects, interpolates path parameters at runtime, and conditionally calls different endpoints based on application state. A mobile app’s API calls exist only in compiled binaries. WebSocket connections are negotiated at runtime. None of this appears in source code in a form that static analysis can reliably extract.

Manual proxy capture works but doesn’t scale. You capture what you click on. If the application has 200 endpoints and you exercise 40 during your manual walkthrough, you’ve documented 40. The other 160 are untested, not because you couldn’t test them, but because you never knew they existed. And the output is raw HTTP traffic, not a structured specification you can feed into other tools.

Vespasian takes a different approach. It instruments a real browser, executes the application’s JavaScript, and intercepts every HTTP request the application makes at the network level. It doesn’t guess what the API looks like from source code. It watches what actually happens on the wire.

How It Works

Vespasian uses a two-stage pipeline that separates traffic capture from specification generation.

Stage 1: Capture. Crawl a target with a headless browser (powered by Katana with full JavaScript execution), or import traffic from Burp Suite XML, HAR files, or mitmproxy dumps. This produces a capture.json file: a JSON array of every observed HTTP request and response.

Stage 2: Generate. Classify captured requests by API type using confidence-based heuristics, deduplicate endpoints, probe them for additional metadata (OPTIONS discovery, GraphQL introspection, WSDL fetching), and output a specification.

# One command: crawl + classify + probe + generate
$ vespasian scan https://app.example.com -o api.yaml [INFO] Starting headless browser crawl of https://app.example.com
[INFO] Captured 127 requests across 23 pages
[INFO] Classified 44 REST endpoints (confidence ≥ 0.50)
[INFO] Deduplicated to 31 unique paths
[INFO] Probing endpoints…
[INFO] Generated OpenAPI 3.0 specification → api.yaml

The scan command runs both stages together. The crawl, import, and generate commands run them independently:

# Capture traffic from a headless browser
$ vespasian crawl https://app.example.com -o capture.json
# Or import traffic you’ve already captured
$ vespasian import burp traffic.xml -o capture.json
$ vespasian import har recording.har -o capture.json
$ vespasian import mitmproxy flows -o capture.json
# Generate a spec from any capture
$ vespasian generate rest capture.json -o api.yaml
$ vespasian generate graphql capture.json -o schema.graphql
$ vespasian generate wsdl capture.json -o service.wsdl

Why two stages? Because capture and generation are different problems with different constraints.

You might capture traffic once during a limited engagement window and generate specs multiple times as your understanding evolves. Another option: import traffic from Burp that a colleague captured during manual testing, then generate from it without re-scanning. You might capture from a mobile app via mitmproxy and generate offline on a plane. The intermediate capture.json file is inspectable, debuggable, and composable. It isolates capture bugs from generation bugs.

Classifying Traffic by API Type

Raw HTTP traffic is a mix of API calls, static assets, page loads, analytics beacons, and third-party requests. Vespasian’s classification engine separates signal from noise using per-type heuristics that assign confidence scores.

REST classification uses five signals: response content-type (application/json, application/xml), URL path patterns (/api/, /v1/, /rest/), HTTP method (POST/PUT/PATCH/DELETE to non-page URLs), response structure (JSON objects and arrays, not HTML), and static asset exclusion (drops .js, .css, .png, .woff, /static/, /assets/). Each signal contributes to a cumulative confidence score.

GraphQL classification looks for the /graphql path (0.70 confidence), GraphQL query syntax in POST bodies (0.85), data/errors response keys (0.80), or a combination of these signals (0.95).

SOAP/WSDL classification detects SOAPAction headers, SOAP envelope XML in request bodies, ?wsdl URL parameters, and text/xml or application/soap+xml content types.

Requests that exceed the confidence threshold (default 0.50, configurable with –confidence) are classified. The –api-type auto default selects the type with the most classified requests. You can also specify the type explicitly with –api-type rest, –api-type graphql, or –api-type wsdl.

REST: Path Normalization and OpenAPI 3.0

The REST pipeline handles the most common case: JSON APIs behind path-based routing.

After classification, Vespasian deduplicates endpoints by detecting dynamic path segments and collapsing them into parameterized templates. /users/42 and /users/87 become /users/{id}. UUID segments are detected automatically. Known literal paths like /users/me and /users/self are preserved as-is. Context-aware parameter naming assigns meaningful names based on path context rather than generic {param1} placeholders.

Vespasian probes each endpoint with an OPTIONS request to discover allowed HTTP methods and CORS configuration, and infers JSON schemas from observed response bodies. The result is a valid OpenAPI 3.0 document with paths, methods, parameters, and response schemas.

openapi: “3.0.0”
info:
title: “Discovered API”
paths:
/api/v2/users/{id}:
get:
parameters:
– name: id
in: path
required: true
schema:
type: string
responses:
“200”:
content:
application/json:
schema:
type: object
properties:
id:
type: integer
email:
type: string
role:
type: string

GraphQL: Tiered Introspection with WAF Bypass

Many production GraphQL servers disable introspection queries. Some Web Application Firewalls detect and block common introspection patterns. Vespasian handles this with a three-tier strategy:

Tier 1 sends a full introspection query with descriptions, deprecation info, and directives. This is the standard query that tools like GraphiQL send. If it works, you get the complete schema.

Tier 2 strips descriptions, deprecation, and directives from the query. Some WAFs pattern-match on the verbose query but pass the minimal version through.

Tier 3 sends the smallest possible introspection query. This is the last-resort payload designed to evade keyword-based blocking.

Fallback: If all introspection is disabled, Vespasian infers the schema from observed queries and mutations in the captured traffic, extracting operation names, argument types from variables, and return field types from response data. The inferred schema is partial but sufficient for security testing.

The output is a GraphQL SDL document.

SOAP/WSDL: Envelope Parsing and Document Fetching

SOAP services are less common than REST and GraphQL, but they still appear, especially in enterprise applications, financial services, and legacy systems. Vespasian detects SOAP traffic by its headers and envelope structure, attempts to fetch existing WSDL documents (by appending ?wsdl to discovered endpoints), and falls back to inferring service definitions from observed SOAP request and response bodies when no WSDL is available.

The result is a WSDL document with service definitions, port bindings, operations, and inferred message types.

Using Existing Burp Traffic

This is the feature our pentesters reach for most often. During a manual assessment, you already have hours of captured traffic in Burp Suite. Rather than re-crawling the target, you export your Burp history as XML and feed it to Vespasian:

$ vespasian import burp traffic.xml -o capture.json
$ vespasian generate rest capture.json -o api.yaml

The importer handles both base64-encoded and plain-text Burp exports, preserves request and response bodies, and supports files up to 500 MB. The same workflow works for HAR files from browser DevTools and mitmproxy flow dumps from mobile testing.

This is especially valuable for mobile application testing. No headless browser crawl can observe a mobile app’s API calls. The traffic lives in the proxy. Vespasian turns that proxy capture into a structured specification.

Assessment Workflow Integration

Vespasian is designed to fit into the offensive security workflows that Praetorian runs every day.

Proxy passthrough. Route the headless browser through Burp Suite with –proxy http://127.0.0.1:8080. You capture traffic in both tools simultaneously: Burp for manual testing, Vespasian for spec generation.

Authentication injection. Inject auth headers with -H “Authorization: Bearer <token>”. Most real-world targets require authentication. Without it, the crawler sees unauthenticated pages and misses the API calls that matter.

Scope control. –scope same-origin (default) restricts the crawl to the same scheme, host, and port. –scope same-domain follows subdomains and different ports, which is useful for SPAs that call APIs on api.example.com from app.example.com.

Hadrian integration. Generate a spec with Vespasian, then pass it to Hadrian for automated authorization testing:

$ vespasian scan https://app.example.com -o api.yaml
$ hadrian test rest –api api.yaml –roles roles.yaml –auth auth.yaml –category all

This creates a complete discover-then-test pipeline: Vespasian maps the API surface, Hadrian tests every endpoint for BOLA, BFLA, and other authorization flaws. No manual spec creation required.

Vespasian joins our growing suite of open-source offensive security tools. In a typical web application assessment, the tools form a pipeline: Pius discovers external assets and subdomains. Nerva fingerprints services on discovered ports. Vespasian crawls discovered web applications to map their API surface. Hadrian tests those APIs for authorization vulnerabilities. Trajan reviews CI pipelines, and Titus reviews a wide array of assets for leaked secrets. For cloud-focused engagements, Aurelian maps the cloud environment and discovers API Gateways, then the APIs behind them get documented by Vespasian and tested by Hadrian. Each tool handles a distinct phase of security work. If you’re interested in using these tools to help secure your organization, you can learn more about our Praetorian Guard Platform at praetorian.com.

Getting Started

Vespasian is available now at github.com/praetorian-inc/vespasian. Install from source or grab a prebuilt binary from the releases page.

go install github.com/praetorian-inc/vespasian/cmd/vespasian@latest

The repository includes test targets for REST, GraphQL, and SOAP APIs so you can see Vespasian in action before pointing it at a real target.

If you find bugs, want to contribute, or have feature requests, open an issue. We’re actively developing Vespasian and want to hear how you’re using it.

Frequently Asked Questions

What is Vespasian?
Vespasian is an open-source API endpoint discovery and specification generation tool built by Praetorian. It captures HTTP traffic through headless browser crawling or imports it from Burp Suite, HAR files, and mitmproxy, then classifies requests by API type and generates specifications: OpenAPI 3.0 for REST, GraphQL SDL for GraphQL, and WSDL for SOAP services.
What types of APIs can Vespasian discover?
Vespasian discovers REST APIs (generating OpenAPI 3.0 specs), GraphQL APIs (generating SDL schemas via introspection or traffic inference), and SOAP/WSDL services (generating WSDL documents). It automatically detects the API type from captured traffic, or you can specify it explicitly with –api-type.
How is Vespasian different from running a web crawler?
Standard web crawlers follow HTML links and index pages. Vespasian intercepts all HTTP traffic from a headless browser with full JavaScript execution, including XHR/fetch API calls, WebSocket upgrades, and dynamically constructed requests that don’t appear in HTML. It then classifies those requests by API type and generates structured specifications, not just URL lists.
Does Vespasian find undocumented APIs?
It discovers any API endpoint that the application calls during the crawl or that appears in imported traffic. If the frontend calls /api/internal/debug at runtime, Vespasian will capture and document it, even if it doesn’t appear in any published API documentation.
Can I use Vespasian with traffic I’ve already captured in Burp Suite?
Yes. Export your Burp Suite HTTP history as XML, then run vespasian import burp traffic.xml -o capture.json followed by vespasian generate rest capture.json -o api.yaml. No re-crawling needed. The same workflow supports HAR files and mitmproxy dumps.
Does Vespasian handle GraphQL servers that disable introspection?
Yes. Vespasian uses a three-tier introspection strategy with progressively simpler queries designed to bypass WAF blocking. If all introspection is disabled, it falls back to inferring the schema from observed queries and mutations in the captured traffic.
How does Vespasian work with Hadrian for API security testing?
Vespasian generates API specifications, and Hadrian consumes them for automated authorization testing. Run vespasian scan to produce an OpenAPI spec, then pass it to hadrian test rest with role definitions and auth tokens. This creates a complete discover-then-test workflow for API security assessments.
Is Vespasian safe to run against production environments?
Vespasian’s crawl stage drives a browser and follows links, which is read-only. The probing stage sends OPTIONS requests, fetches ?wsdl documents, and runs GraphQL introspection queries — all read-only operations. However, always coordinate with the target owner and prefer staging environments during security assessments.
What output formats does Vespasian support?
OpenAPI 3.0 (YAML or JSON) for REST APIs, GraphQL SDL for GraphQL APIs, and WSDL XML for SOAP services. Each API type gets its native specification format.The post Meet Vespasian. It Sees What Static Analysis Can’t. appeared first on Praetorian.

*** This is a Security Bloggers Network syndicated blog from Offensive Security Blog: Latest Trends in Hacking | Praetorian authored by n8n-publisher. Read the original post at: https://www.praetorian.com/blog/vespasian-api-endpoint-discovery-tool/

About Author

What do you feel about this?

Subscribe To InfoSec Today News

You have successfully subscribed to the newsletter

There was an error while trying to send your request. Please try again.

World Wide Crypto will use the information you provide on this form to be in touch with you and to provide updates and marketing.