Public benchmark · 5 tools · 28 corpora · reproducible

audytx vs Checkov, Trivy, KICS & Terrascan

Five scanners run against 28 real-world AWS Terraform corpora — 7 with known security findings (recall) and 21 production-grade community modules expected to be clean (precision). Every number below is measured, scored by a deterministic script, and reproducible from a public repo. Headline: audytx and Checkov are the only two tools that detect all 31 IAM privilege-escalation paths — and on the 21 clean modules audytx fires ~36× fewer false positives than Checkov. It also logs the fewest clean-module false positives of all five (33, just ahead of KICS's 34) — and KICS reaches that only by detecting 3% of the privesc paths to audytx's 100%.

Provenance. Re-run at the current audytx engine v0.14.8 (247 rules, 17 reasoning axes) on 2026-06-16, against Checkov 3.2.520 · Trivy 0.71.0 · KICS 2.1.20 · Terrascan 1.19.9 (all pinned). The audytx column is pulled live from GitHub Code Scanning per bench branch and accepted only if its SARIF driver.version equals the live engine — so no stale data can slip in. Scored with scripts/score.py (Python-3 stdlib, byte-deterministic). We re-run the full suite each milestone; reproduce any row yourself from the steps at the bottom.

100%

recall on 31 documented IAM privilege-escalation paths

tied with Checkov · KICS 3% · Trivy 0%

36×

fewer false positives than Checkov on 21 clean modules

33 vs 1,193 · fewest of all five (KICS 34) · 14× fewer than Terrascan

2×

IAM-privesc precision vs Checkov at the same 100% recall

23% vs 12% · half the alert volume (135 vs 269)

Table 1 — IAM privilege-escalation: precision & recall

Corpus: BishopFox iam-vulnerable — 31 documented AWS IAM privilege-escalation paths, one Terraform file per path. A tool "detects" a path if it fires at least one HIGH/CRITICAL finding on the file implementing it. audytx and Checkov are scored on HIGH; KICS and Trivy on HIGH+CRITICAL.

Tool	HIGH findings	TP	FP	FN	Precision	Recall
audytx	135	31	104	0	23%	100%
Checkov	269 ¹	31	238	0	12%	100%
KICS	9	1	8	30	11%	3%
Terrascan	DNF ²	—	—	—	—	—
Trivy	7	0	7	31	0%	0%

¹ Checkov run offline (no Bridgecrew API) emits no per-check severity, so this is total failed checks, not HIGH-only — generous for Checkov's recall, conservative for its precision.
² Terrascan exceeded the 5-minute timeout on this corpus (large module graph).
audytx and Checkov are the only two tools that detect all 31 paths. audytx does it with ~2× the precision and half the alert volume. Most of audytx's 104 "FP" here are legitimate secondary detections (e.g. AWS_OPS_038 firing on the same privesc file the primary rule already claimed) plus the corpus's own intentional FP-test fixtures — TP matching counts only one finding per path, which structurally undercounts audytx precision.

Table 2 — False positives on 21 clean production modules

This is the wedge. Each corpus is a well-regarded, actively-maintained AWS community Terraform module with an expected high-severity count of 0. Lower is better — every HIGH finding here is noise a reviewer has to triage. Raw counts shown as measured (nothing subtracted).

Corpus (clean module)	audytx	Checkov	Trivy	KICS	Terrascan
cloudposse-s3-bucket	1	36	1	0	2
terraform-aws-alb	4	54	13	2	6
terraform-aws-apigateway-v2	0	20	7	1	5
terraform-aws-autoscaling	0	11	6	0	1
terraform-aws-cloudfront	0	24	8	0	6
terraform-aws-ecr	0	5	1	2	0
terraform-aws-ecs	0	86	16	2	16
terraform-aws-eks	6	88	38	1	1
terraform-aws-eventbridge	7	57	18	12	24
terraform-aws-iam	3	287	1	0	4
terraform-aws-kms	0	1	0	0	11
terraform-aws-lambda	6	112	23	7	9
terraform-aws-rds	2	124	7	1	10
terraform-aws-s3-bucket	3	129	18	4	348 ³
terraform-aws-secure-baseline	1	107	8	2	8
terraform-aws-security-group	0	10	2	0	0
terraform-aws-sns	0	4	0	0	0
terraform-aws-sqs	0	1	0	0	0
terraform-aws-step-functions	0	6	1	0	1
terraform-aws-vpc	0	25	3	0	1
trussworks-s3-private	0	6	4	0	1
Total	33	1,193	175	34	454

³ Terrascan fires 348 HIGH alerts on a single module (terraform-aws-s3-bucket) — a mass-rule blowup that alone accounts for 77% of its total.
audytx's 33 = ~36× fewer than Checkov (1,193), 14× fewer than Terrascan (454), and 5× fewer than Trivy (175) — and the fewest raw false positives of all five, just ahead of KICS (34), which reaches that only by detecting 3% of the privesc paths (Table 1) to audytx's 100%. Several of audytx's 33 trace to documented justified exceptions (real issues in the modules' own example code, tracked in clean-modules.yaml); the rest are new rules (EKS secrets-encryption, deprecated Lambda runtimes) correctly flagging issues in the modules' examples/ — see the honest column below.

Table 3 — Recall corpora (raw HIGH counts)

Six additional corpora with deliberately insecure configurations. We have no path-level ground truth here beyond iam-vulnerable, so these are raw HIGH counts, not scored precision/recall — more is not automatically better, since a chunk of any tool's count is noise. Shown for completeness.

Corpus	audytx	Checkov	Trivy	KICS	Terrascan
KaiMonkey	40	109	112	0	21
iam-role-chain	4	9	0	1	0
learn-terraform-provision-eks-cluster	2	3	3	0	0
sadcloud	47	201	26	53	58
terraform-aws-eks-blueprints	29	210	DNF ⁴	13	7
terragoat	52	466	93	70	35

⁴ Trivy timed out on eks-blueprints (5-min limit); Terrascan DNF on iam-vulnerable (Table 1).

How the precision gap happens — a worked example

The Table 2 result is not fewer rules — it's cross-resource reasoning. audytx pre-computes relationship graphs and suppresses findings that context proves benign, showing the rationale instead of dropping them silently. Here's the mechanism on one fixture (testbed #11), illustrative of why the clean-module counts diverge so far.

Serverless messaging — SQS DLQ chain, sync + polled-async Lambdas, TTL'd DynamoDB

Single-resource scanners flag each resource against a checklist. audytx reads how the resources connect first.

Single-resource scanner

aws_lambda_function.chirp_api
Lambda DLQ missing

aws_lambda_function.chirp_outbox_worker
Lambda DLQ missing

aws_dynamodb_table.chirp_request_log
point-in-time recovery not enabled

aws_sqs_queue.chirp_outbox_dlq
queue has no DLQ of its own

fires on every resource that fails a pattern 4 noise findings

audytx — same resources, with context

aws_lambda_function.chirp_api
suppressed — sync via API Gateway; a Lambda DLQ only fires on async invokes, so it would never receive an event

aws_lambda_function.chirp_outbox_worker
suppressed — polled-async via SQS event-source mapping; failures handled by the queue's redrive_policy, not a function DLQ

aws_dynamodb_table.chirp_request_log
suppressed — TTL configured for ephemeral request logs; PITR is mismatched for data that self-expires

aws_sqs_queue.chirp_outbox_dlq
suppressed — this queue is the dead-letter queue; requiring it to have its own DLQ is infinite regress

each suppression carries its reasoning 0 noise findings

Multiply this across DLQ identity, Lambda invocation graphs, encryption variants, data lifetime, network exposure, IAM trust/policy reachability, tag environment and IMDSv2 inheritance — 17 reasoning axes in the live engine — and you get the 48-vs-1,193 gap in Table 2.

Methodology

Everything needed to reproduce the run, exactly as it was performed.

Tool	Version	How it was run
audytx	0.14.6	Live GitHub App scan → Code Scanning SARIF (version-verified)
Checkov	3.2.520	pip install · `checkov -d <dir> --framework terraform -o json`
Trivy	0.71.0	`trivy config <dir> --severity HIGH,CRITICAL`
KICS	2.1.20	`kics scan -p <dir> -t Terraform`
Terrascan	1.19.9	`terrascan scan -i terraform -d <dir>`

Corpus: 28 AWS-Terraform repos/modules in the public audytx-testbed, each on a bench/<name> branch. 5-minute timeout per tool per corpus (timeouts = DNF). No suppression files for any tool. Scoring is scripts/score.py (Python-3 stdlib only) — given the same inputs it produces a byte-identical scorecard every run. TP matching: a finding counts once per ground-truth path if its file and resource/rule reference that path; extra findings on the same path count as FP (conservative for audytx).

Where audytx is weaker — the honest column

This is a benchmark, not a sales sheet. The places audytx loses:

Checkov has more raw coverage. Many legitimate Checkov findings (X-Ray tracing, code signing, function-in-VPC, reserved concurrency, TLS-version pinning) are real concerns audytx does not yet flag. If you want breadth-first "tell me everything potentially wrong," Checkov has more rules. audytx's catalog is a curated set focused on patterns it understands deeply enough to reason about — depth over breadth, by design.
The IAM precision number is honest, not flattering. On the deliberately-vulnerable iam-vulnerable corpus, audytx fires 135 HIGH findings for 31 paths. Even accounting for legitimate secondary detections and the corpus's own FP-test fixtures, that is a lot of alerts — appropriate for a corpus that is wall-to-wall privesc, but it's not a "low volume" story there. The low-volume story is Table 2.
Clean-module false positives: 27 (v0.4.1) → 33 (v0.14.8). Investigating that rise surfaced a real bug — AWS_OPS_010 (public Lambda Function URL) had inverted match-logic and fired on phantom module-synthesized URLs; fixed in v0.14.8, which removed 15 of them. The remaining handful are new rules (EKS secrets-encryption, deprecated Lambda runtimes) correctly flagging real issues in the modules' own examples/ code — not bogus matches. Net, audytx is again the lowest-false-positive tool of the five (33), ~36× below Checkov.
The ground truth was authored by us. The iam-vulnerable path list follows directly from BishopFox's upstream docs and audytx rules were not tuned against it, but it is our scoring file. The unmatched-findings audit is published for independent checking.
Single run, AWS-only. Each tool was scanned once (Terrascan in particular shows timeout variance on large corpora), and the whole corpus is AWS Terraform — this says nothing about multi-cloud or CloudFormation, which audytx deliberately does not cover.

Reproduce it yourself

The corpus, the ground truth, and the scorer are public. You do not have to take our numbers on faith.

# 1. Clone the public benchmark corpus
git clone https://github.com/victorsinha/audytx-testbed
cd audytx-testbed

# 2. Run any competitor on a corpus (example: Checkov on a clean module)
checkov -d corpus/terraform-aws-rds --framework terraform -o json | jq '.summary'

# 3. audytx numbers come from the live Code Scanning SARIF on each bench branch
gh api "repos/victorsinha/audytx-testbed/code-scanning/analyses?ref=refs/heads/bench/terraform-aws-rds" \
  --jq '[.[] | select(.tool.name=="audytx")] | sort_by(.created_at) | last | .id'

# 4. Re-score everything deterministically
python3 scripts/score.py results ground-truth

Full write-up — every table, footnote and caveat — lives in docs/benchmark-v1.md in the engine repo.

audytx vs Checkov

audytx vs Checkov: AWS Terraform security scanner comparison

The short version: both detect 100% of documented IAM privilege-escalation paths. audytx fires 36× fewer false positives on clean production modules (33 vs 1,193). Checkov has more raw rule coverage. For teams whose scanner is muted because of noise, audytx's precision matters more — for teams that want maximum breadth and tolerate triage work, Checkov delivers more rules.

Choose audytx when

Your team has turned off or started ignoring another scanner due to alert fatigue
You need 100% IAM privesc recall AND low noise (audytx is the only tool that delivers both)
You want the reasoning behind each suppressed finding — not just a pass/fail
You use AI coding agents and need an MCP server for pre-PR checks
You want free PR comments without a Bridgecrew account or API key

Choose Checkov when

You need maximum rule breadth: X-Ray tracing, code signing, function-in-VPC, TLS version pinning, reserved concurrency — Checkov has these, audytx doesn't yet
You're already on the Bridgecrew/Prisma Cloud platform and want native integration
You run multi-cloud or CloudFormation (audytx is AWS + Terraform only by design)
You want a broad "tell me everything possibly wrong" sweep rather than a high-precision review

Key numbers: IAM privesc recall — audytx 100%, Checkov 100%. False positives on 21 clean modules — audytx 33, Checkov 1,193 (36×). IAM precision — audytx 23%, Checkov 12% (2× at the same recall). Alert volume on iam-vulnerable — audytx 135, Checkov 269 (half). Full data in Tables 1–2 above.

audytx vs Trivy

audytx vs Trivy: Terraform IaC scanner comparison

Trivy is a multi-purpose security scanner (containers, images, SBOMs, IaC). Its Terraform coverage focuses on common misconfigurations and has 0% recall on IAM privilege-escalation paths — Trivy fires 7 HIGH findings on iam-vulnerable, none of which are correct privilege-escalation detections. audytx has 5× fewer false positives on clean modules (33 vs 175) while detecting all 31 IAM privesc paths Trivy misses entirely.

Choose audytx when

IAM security is a priority — Trivy has no IAM attack-path detection
You want cross-resource reasoning and context-aware suppression
You're Terraform-on-AWS focused and want depth over breadth
You need MCP server integration for AI coding agents

Choose Trivy when

You need a single tool covering containers, images, SBOMs, and IaC together
You scan multiple cloud providers or CloudFormation (Trivy supports both)
You want Kubernetes manifest and Helm chart scanning alongside Terraform
You need offline / airgapped scanning with self-contained binaries

Key numbers: IAM privesc recall — audytx 100%, Trivy 0%. False positives on 21 clean modules — audytx 33, Trivy 175 (5×). Trivy fires 7 HIGH findings on iam-vulnerable; all 7 are false positives (0 true positives).

audytx vs KICS

audytx vs KICS: Terraform security tool comparison

KICS (Keeping Infrastructure as Code Secure, by Checkmarx) scores close to audytx on clean-module false positives (34 vs 33) — but reaches that only by detecting 3% of IAM privilege-escalation paths (1 of 31) versus audytx's 100%. KICS trades recall for precision. audytx achieves both: the lowest false-positive count and full IAM attack-path coverage.

Choose audytx when

You need both low false positives AND comprehensive IAM privesc detection
You want each suppressed finding shown with its rationale — not just fewer alerts
You want a GitHub App (install in 60s, no CI step) instead of a CLI tool
MCP server support for AI coding agents matters

Choose KICS when

You need multi-cloud support: KICS covers Azure, GCP, Kubernetes, Docker, Ansible, CloudFormation
You're already on the Checkmarx platform for SAST and want a unified tool
IAM attack-path detection is not a priority and low alert volume is
You prefer a self-hosted CLI with no external calls

Key numbers: IAM privesc recall — audytx 100%, KICS 3% (1 of 31 paths). False positives on 21 clean modules — audytx 33, KICS 34 (essentially tied). KICS achieves low noise by missing nearly all IAM attack paths; audytx achieves both.

audytx vs Terrascan

audytx vs Terrascan: Terraform static analysis comparison

Terrascan (by Tenable) timed out on the iam-vulnerable corpus (5-minute limit exceeded) and produced 454 false positives on 21 clean modules — 14× more than audytx. A single module (terraform-aws-s3-bucket) triggered 348 HIGH alerts in a mass-rule blowup, accounting for 77% of Terrascan's total clean-module count.

Choose audytx when

Scan time reliability matters — Terrascan timed out on large module graphs
You need IAM privilege-escalation path detection (Terrascan DNF'd on this corpus)
14× lower false-positive volume is a meaningful team-productivity gain
You want a GitHub App with PR comments and SARIF upload, not a local CLI

Consider Terrascan when

You need broad multi-cloud coverage (Azure, GCP, Docker, Kubernetes) alongside Terraform
You want OPA-based custom policies with Rego
You're on the Tenable platform and want native integration
You need airgapped / fully self-hosted scanning

Key numbers: IAM privesc recall — audytx 100%, Terrascan DNF (timeout on iam-vulnerable corpus). False positives on 21 clean modules — audytx 33, Terrascan 454 (14×). Terrascan's 348 alerts on a single module suggest a rule-volume issue on larger module graphs.

See your own numbers

One click to install. Free on every repo, public or private — there's no plan to choose.

Install audytx →