Kraken: How We Built a Pentester That Thinks Like an Attacker
Traditional scanners run checklists. Kraken runs attack chains.
In 28 iterations and zero human input, Kraken found an SSRF vulnerability in an Azure Function App, extracted storage account credentials from a configuration file the server should never have served, enumerated every blob container in the account, discovered an SSH private key in a dev container nobody remembered existed, logged into a virtual machine, activated its managed identity, found an Automation Account with Owner-level permissions, and wrote a PowerShell runbook that promoted itself to subscription Owner.
Level 4. Full tenant compromise. No human touched the keyboard.
This is not a hypothetical. It is what Kraken did, autonomously, against AzureGoat — the industry-standard intentionally vulnerable Azure environment. And the mechanism behind it is fundamentally different from anything a traditional scanner does.
Scanners Find Vulnerabilities. Attackers Chain Them.
Every security team knows the workflow: run Nessus or OpenVAS, get a PDF with hundreds of findings sorted by CVSS score, hand it to engineering, argue about priorities. Run Burp Suite against the web layer, get a list of reflected XSS and missing headers. These tools are useful. They are also fundamentally limited.
Traditional scanners execute a predetermined set of tests. Each plugin or rule tests for one thing. The output is a list of individual findings. An SSRF is reported as a medium-severity issue. A publicly listable storage container is a separate finding. An exposed SSH key is another. The scanner has no concept that these three findings, chained in the right order, constitute a full infrastructure compromise.
Real attackers don’t work from checklists. They reason. An SSRF is not a finding to report — it is a tool to steal credentials. A storage key is not an endpoint — it is a stepping stone to SSH keys. An SSH key leads to a VM. A VM has a managed identity. This is the OODA loop: Observe, Orient, Decide, Act — repeated until the objective is reached.
No commercial scanner chains a web vulnerability into cloud privilege escalation. Web scanners and cloud security posture tools occupy different market categories entirely. But the attacker doesn’t care about your tool categories.
| Traditional Scanners | Kraken | |
|---|---|---|
| Decision making | Fixed rules and plugins | AI reasoning per iteration |
| Attack chaining | Reports individual findings | Chains web vulns into cloud escalation |
| Learning | None across scans | Cross-scan knowledge vault with AI synthesis |
| False positives | High (version fingerprinting) | Low (heuristic validation + report grounding) |
| Cloud expertise | Generic plugins | Cloud-specific playbooks (Azure, AWS) |
| Adaptability | Same tests regardless of results | Every result shapes the next action |
Architecture: Three Phases of Autonomous Pentesting
Kraken runs a three-phase pipeline against every target.
Phase 1: Cloud Detection
Before any scanning begins, Kraken fingerprints the target’s cloud provider. It inspects the hostname (.azurewebsites.net, .amazonaws.com, .run.app), HTTP response headers (x-azure-ref, x-ms-request-id), and page source (storage SDK URLs, identity provider references). The result — Azure, AWS, GCP, or generic — determines which specialized attack playbook and system prompt Claude receives.
This matters because cloud infrastructure attacks follow fundamentally different paths. An Azure SSRF targets the Function App metadata and blob storage. An AWS SSRF targets the EC2 Instance Metadata Service. A generic web target gets tested for IDOR, SQLi, and application-layer vulns. The right playbook for the right infrastructure.
Phase 2: Reconnaissance
Four parallel probes run simultaneously: nmap port scanning on web ports, endpoint enumeration against 15+ standard paths (plus cloud-specific paths when a cloud provider is detected), page source scraping for JavaScript bundles and hardcoded secrets, and cloud-specific enumeration like SQL injection probing on Azure user-facing APIs.
The raw data from all probes is sent to Claude for summarization into a structured JSON: open ports, API endpoints, storage URLs, discovered users, and interesting files. This mirrors what a human pentester does in the first hour — compressed to seconds.
Phase 3: The AI Attack Loop
This is the core of Kraken, and it is where everything changes.
The ReAct Loop: An AI That Reasons About What It Finds
Kraken uses a ReAct (Reasoning + Acting) loop. Claude AI acts as the attacker’s brain. Python functions are the hands.
On each iteration, Claude receives the full conversation history of every action taken and every result received. It outputs two things: reasoning text explaining what it found, what it means, and what to do next, and a tool call — the specific action to take. Python executes the tool, captures the result, and sends it back to Claude. Claude reasons again.
Purpose-Built Offensive Tools
Claude has over 20 purpose-built tools at its disposal. These are not generic wrappers — each encodes offensive security knowledge:
- Cloud infrastructure: Azure CLI execution, AzureHound tenant graph enumeration, Automation Account runbook escalation, AWS S3 enumeration, IMDS credential extraction, IAM enumeration, role assumption, Lambda code retrieval, permission simulation
- Web exploitation: SSRF firing with blob URL following, HTTP probing, file download, SSH execution
- Vulnerability-specific: JWT decoding and forging, command injection with six bypass techniques (
;,&&,|, backticks,$(), time-based blind), XXE injection via DOCTYPE/ENTITY, boolean-based blind SQLi with filter bypass, LFI probing across path traversal depths, file upload with JPEG magic byte bypass and null-byte injection, default credential testing against 15 common pairs
Each tool returns structured results (success, data, error), truncated to 2,000 characters to prevent context bloat.
Guardrails
The loop terminates on Level 4 achievement, 50 iterations, 5 consecutive failures, stuck-loop detection (same tool + arguments called 3 times), or a configurable per-scan cost limit. Destination guardrails block loopback and link-local addresses in production, preventing the tool from inadvertently attacking its own infrastructure.
Cloud-Native Attack Chains
This is Kraken’s sharpest differentiator. No other automated tool chains web-layer vulnerabilities into cloud infrastructure compromise.
Azure: Web App to Tenant Owner in Seven Steps
The Azure playbook, encoded in Claude’s system prompt and validated end-to-end on AzureGoat:
- SSRF confirmation — fire an SSRF payload to read
/etc/passwd, confirming the vector exists - Credential extraction — SSRF to read
local.settings.json, extracting the storage account name and key - Storage enumeration — Azure CLI to list all blob containers in the storage account
- SSH key retrieval — download a private key from a dev container
- VM access — SSH into the VM, run
az login -ito activate the managed identity - Privilege discovery — enumerate Automation Accounts, find one with Owner-level permissions
- Privilege escalation — create and execute a PowerShell runbook that assigns Owner role to the VM’s managed identity
Each step unlocks the next. No single step is a “finding” in isolation. The chain is the finding.
AWS: Metadata to Administrator
The AWS playbook follows a parallel pattern:
- S3 bucket enumeration — derive bucket names from the target hostname, test for public listing and ACL misconfigurations
- IMDS exploitation — SSRF to the EC2 metadata service (
169.254.169.254) to extract temporary IAM credentials - IAM enumeration — map the caller’s identity, attached policies, and accessible services
- Permission simulation — test 22 high-value IAM actions (like
iam:CreatePolicyVersion,iam:AttachUserPolicy,iam:PassRole) via the IAM policy simulator — stealthier than brute-force enumeration - Privilege escalation — exploit dangerous IAM permissions to achieve AdministratorAccess or EC2 shell
Not Scripts — Knowledge
These playbooks are not hardcoded in Python. They are encoded as knowledge in Claude’s system prompt. Claude uses reasoning to decide when to follow the playbook and when to deviate. If the SSRF path is blocked, it adapts — tries Azure CLI directly, looks for exposed .env files, checks for default credentials. The system prompt says: “When HTTP is blocked, use Azure CLI tools.” The AI interprets this guidance in context.
Compound Learning: The Vault
Kraken’s second architectural layer is inspired by Andrej Karpathy’s concept of a “second brain” — a persistent, AI-curated knowledge base that compounds intelligence across every engagement.
After each scan, Kraken writes a raw record to the vault. Credentials are stripped. Techniques, tool sequences, and outcomes are preserved. Claude generates a 3-5 sentence synthesis of what worked, what failed, and why. The vault stores patterns, not secrets.
Every 10 scans, Claude rewrites the entire wiki from scratch — deduplicating patterns, removing contradictions, and weighting Level 3-4 results and frontier model scans more heavily. This is not appending logs. It is AI-curated knowledge synthesis.
Before each new scan, query_vault() retrieves relevant learnings filtered by cloud type and vulnerability tags, and injects them into Claude’s initial message. The agent starts each engagement knowing what worked last time against similar targets — and what didn’t.
The practical result: Kraken’s first scan against a new target class is good. Its tenth is meaningfully better. Its hundredth reflects accumulated intelligence from every prior engagement.
Cost-Aware by Design
AI-driven pentesting could be expensive. Kraken is engineered to keep costs practical.
The system prompt and tool declarations — identical on every iteration — are cached using Anthropic’s prompt cache API. On a 50-iteration scan, this eliminates 49 redundant re-reads, reducing input token cost by roughly 90%. Model routing sends cloud-specific targets (which need deeper reasoning) to Claude Opus and generic web scans to Claude Sonnet. A configurable per-scan cost cap automatically terminates scans that exceed budget.
What This Means for Your Security Program
Kraken is not replacing human pentesters. It is extending what a security team can do — running continuous, adaptive assessments at a fraction of the time and cost of a manual engagement.
For organizations running on Azure or AWS, the cloud-native attack chains represent a capability that does not exist in any other automated tool. No scanner chains a web SSRF into cloud tenant compromise. Kraken does, because that is what a real attacker would do.
The compound learning vault means the system improves with use. Every scan contributes to the next. Techniques that work are reinforced. Dead ends are catalogued and avoided. The tool gets smarter the more you use it.
This is the difference between an AI that reasons and an AI that learns. Kraken does both.
Kraken is built by ThreatMate. Authorized security testing only.
