Ten phases · depth over breadth · built for the agent era

Roadmap

What audytx is building and why. Phases are ordered by leverage — each one deepens the moat rather than expanding the surface.

The thesis: every incumbent (Checkov, Trivy, Snyk, KICS) wins on breadth and rule count, produces real recurring false positives, and flags without fixing. That leaves a precise, defensible lane: the zero-setup AWS + Terraform reviewer that is accurate (context-aware, low false-positive rate), reasons about IAM the way attackers do, and teaches you why — running in the cloud so a solo dev on a Chromebook gets the same review as a funded startup.

The bet behind the sequencing: more and more of the world's Terraform is written by coding agents — and when the author is an agent, the reviewer has to be consumable by the agent too. Context-aware cross-resource reasoning matters more in that world, because no human on the team holds the context. That's why the agent surface ships first, and why the test corpus of the future is AI-generated Terraform.

✓

Foundation — benchmark, fixes, IAM depth, guardrails

Shipped · v0.2–v0.3

The original four phases, complete: prove the accuracy moat with a measured benchmark, make every finding fixable and teachable, go deep on IAM the way attackers do, and guard against the credential / bill-shock incident.

What shipped

5-tool benchmark on 28 corpora — 100% recall on 31 IAM privilege-escalation paths (tied with Checkov; KICS 3%, Trivy 0%) at ~36× fewer false positives than Checkov on 21 clean production modules (33 vs 1,193 — the fewest of the five). See the full comparison.
One-click GitHub suggestions — single-line and multi-line fixes anchored to the exact offending lines, plus plain-English "why this matters" per finding
The IAM attack graph — privilege-escalation catalog gated on exploitability, trust-graph reasoning, multi-hop role chaining, and cross-resource attack paths ATTACK_PATH_001–008
Secrets + bill-shock fusion — hardcoded-credential detection and cost×security signals (GPU + admin IAM, hardcoded keys + expensive compute)
17 context-reasoning axes — the false-positive suppression layer, with every suppression surfaced with its rationale

The agent surface — MCP + autofix loop

Shipped · v0.4.0

When the code author is an agent, the reviewer must be consumable by the agent. audytx is now an MCP server — the same engine the GitHub App runs, callable before the PR exists. One line of config, no CI, no token:

claude mcp add --transport http audytx https://audytx.com/mcp
scan_terraform — findings with file:line evidence, severity, fix snippets, and the context-suppressed findings with their rationale
autofix_terraform — the server applies the precisely-anchored sound fixes, re-scans, loops; returns fixed files + what remains. Same soundness bar as GitHub one-click suggestions: never a corrupting edit

Benchmark v1.0 — ground truth

Shipped

Turn the internal benchmark into a publishable, attack-proof artifact: label ground truth on the IAM corpus against its documented privilege-escalation paths so every tool gets a true precision/recall score (not a raw finding count), re-pull all corpora from SARIF like-for-like, and publish the harness with the write-up.

Never silent, never blind

Shipped · v0.5

A check-run per scan so a failed or skipped scan is a visible status instead of silence, and real usage telemetry (installs → scans → outcomes). A reviewer you can't tell is working is a reviewer you stop trusting.

Findable everywhere

Deferred

GitHub Marketplace listing, the published benchmark, and the MCP endpoint announced where agent builders look. Zero-setup only matters if you can find the thing to not-set-up.

The AI-generated-Terraform corpus

Shipped

Generate Terraform from frontier models against realistic infra prompts, catalog the characteristic failure modes (plausible-but-overbroad IAM, hallucinated module arguments, missing companion resources), calibrate the engine against them — and publish "what AI-generated Terraform gets wrong." The test corpus of the market that's coming, re-run per model generation.

Parser ceiling — module calls + expressions

Shipped · v0.6–v0.7

Expand registry module calls (starting with the most-used terraform-aws-modules), resolve variable defaults / tfvars / locals, and handle count/for_each. Real repos — and AI-generated ones especially — compose modules heavily; this raises the ceiling of every reasoning axis and attack path on real-world code.

IAM v2 — the effective-permission engine

Shipped · v0.8–v0.10

Wildcard action expansion against a real service-action table, Condition / NotAction / permission-boundary math, identity×resource-policy intersection. Attack paths become graph reachability over effective permissions instead of curated patterns — found by search, not by hand-written rules.

Suppression integrity

Shipped · v0.11

Every context-suppression is a potential false negative, so the suppressions get their own adversarial test suite: for each axis, real HCL where the suppression must not fire — published as a false-negative rate alongside the false-positive benchmark. Precision and recall of the reasoning itself.

Opt-in plan ingestion

Shipped · v0.12

Accept terraform plan JSON via an Actions artifact as a secondary, never-required input — it resolves what HCL heuristics can't (final computed values, count expansion). Mandatory plan ingestion stays off the table: it kills zero-setup.

Cost×security depth → team governance

Horizon

Before/after cost delta fused with security context ("expensive AND cryptojacking-shaped"), then the org layer: merge gates on Critical findings, cross-PR baselines, compliance evidence export, multi-repo visibility. Explicitly demand-gated — individual devs and agents have to love it first.

Who this is for

Early-stage startups

3–10 people, no dedicated security or platform engineer. Needs PR-level guidance without a CI pipeline config or a week of setup.

Solo developers

Refuses the overhead of maintaining local CLI scanners. The cloud-executes-it-for-you value prop: install the GitHub App once, every future PR is reviewed.

Students / resource-constrained devs

Can't run Checkov + tfsec + Infracost locally on every push. A Chromebook gets the same quality review as a funded team's workstation.

Coding agents

Writing more of the world's Terraform every month. One MCP endpoint gives the agent the full context-aware review — and the autofix loop — before the PR exists.

What's deliberately not on the roadmap

Multi-cloud (Azure, GCP). AWS-deep is the v0.x wedge. Every resource-type and reasoning axis is AWS-specific by design — breadth across clouds at the cost of IAM depth isn't the trade.
CloudFormation, Pulumi, CDK input. Terraform-only today. The parser crate has a CloudFormation stub; it isn't wired to the PR path.
Unsupervised commits to your branches. The MCP autofix loop returns fixed files to the agent that asked for them, gated to precisely-anchored sound fixes — audytx never pushes commits to your repo. On the PR surface, fixes stay one-click suggestions you apply.
Mandatory terraform plan ingestion. It needs cloud credentials and CI wiring — exactly the setup burden audytx exists to remove. An opt-in plan-JSON artifact is Phase 9.
SAML SSO / enterprise onboarding. Won't build the paperwork until the product is sticky with individuals. Enterprise procurement can wait.

Follow the build

Install audytx and every new axis, rule, and phase lands on your next PR — no upgrade step.

Install audytx → See changelog →