Public benchmark · 5 tools · 28 corpora · reproducible

audytx vs Checkov, Trivy, KICS & Terrascan

Five scanners run against 28 real-world AWS Terraform corpora — 7 with known security findings (recall) and 21 production-grade community modules expected to be clean (precision). Every number below is measured, scored by a deterministic script, and reproducible from a public repo. Headline: audytx and Checkov are the only two tools that detect all 31 IAM privilege-escalation paths — and on the 21 clean modules audytx fires ~36× fewer false positives than Checkov. It also logs the fewest clean-module false positives of all five (33, just ahead of KICS's 34) — and KICS reaches that only by detecting 3% of the privesc paths to audytx's 100%.

Provenance. Re-run at the current audytx engine v0.14.8 (247 rules, 17 reasoning axes) on 2026-06-16, against Checkov 3.2.520 · Trivy 0.71.0 · KICS 2.1.20 · Terrascan 1.19.9 (all pinned). The audytx column is pulled live from GitHub Code Scanning per bench branch and accepted only if its SARIF driver.version equals the live engine — so no stale data can slip in. Scored with scripts/score.py (Python-3 stdlib, byte-deterministic). We re-run the full suite each milestone; reproduce any row yourself from the steps at the bottom.
100%
recall on 31 documented IAM privilege-escalation paths
tied with Checkov · KICS 3% · Trivy 0%
36×
fewer false positives than Checkov on 21 clean modules
33 vs 1,193 · fewest of all five (KICS 34) · 14× fewer than Terrascan
IAM-privesc precision vs Checkov at the same 100% recall
23% vs 12% · half the alert volume (135 vs 269)

Table 1 — IAM privilege-escalation: precision & recall

Corpus: BishopFox iam-vulnerable — 31 documented AWS IAM privilege-escalation paths, one Terraform file per path. A tool "detects" a path if it fires at least one HIGH/CRITICAL finding on the file implementing it. audytx and Checkov are scored on HIGH; KICS and Trivy on HIGH+CRITICAL.

ToolHIGH findingsTPFPFNPrecisionRecall
audytx 135 31 104 0 23% 100%
Checkov269 ¹31238012%100%
KICS9183011%3%
TerrascanDNF ²
Trivy707310%0%

¹ Checkov run offline (no Bridgecrew API) emits no per-check severity, so this is total failed checks, not HIGH-only — generous for Checkov's recall, conservative for its precision.
² Terrascan exceeded the 5-minute timeout on this corpus (large module graph).
audytx and Checkov are the only two tools that detect all 31 paths. audytx does it with ~2× the precision and half the alert volume. Most of audytx's 104 "FP" here are legitimate secondary detections (e.g. AWS_OPS_038 firing on the same privesc file the primary rule already claimed) plus the corpus's own intentional FP-test fixtures — TP matching counts only one finding per path, which structurally undercounts audytx precision.

Table 2 — False positives on 21 clean production modules

This is the wedge. Each corpus is a well-regarded, actively-maintained AWS community Terraform module with an expected high-severity count of 0. Lower is better — every HIGH finding here is noise a reviewer has to triage. Raw counts shown as measured (nothing subtracted).

Corpus (clean module) audytx CheckovTrivyKICSTerrascan
cloudposse-s3-bucket136102
terraform-aws-alb4541326
terraform-aws-apigateway-v2020715
terraform-aws-autoscaling011601
terraform-aws-cloudfront024806
terraform-aws-ecr05120
terraform-aws-ecs08616216
terraform-aws-eks6883811
terraform-aws-eventbridge757181224
terraform-aws-iam3287104
terraform-aws-kms010011
terraform-aws-lambda61122379
terraform-aws-rds21247110
terraform-aws-s3-bucket3129184348 ³
terraform-aws-secure-baseline1107828
terraform-aws-security-group010200
terraform-aws-sns04000
terraform-aws-sqs01000
terraform-aws-step-functions06101
terraform-aws-vpc025301
trussworks-s3-private06401
Total 33 1,19317534454

³ Terrascan fires 348 HIGH alerts on a single module (terraform-aws-s3-bucket) — a mass-rule blowup that alone accounts for 77% of its total.
audytx's 33 = ~36× fewer than Checkov (1,193), 14× fewer than Terrascan (454), and 5× fewer than Trivy (175) — and the fewest raw false positives of all five, just ahead of KICS (34), which reaches that only by detecting 3% of the privesc paths (Table 1) to audytx's 100%. Several of audytx's 33 trace to documented justified exceptions (real issues in the modules' own example code, tracked in clean-modules.yaml); the rest are new rules (EKS secrets-encryption, deprecated Lambda runtimes) correctly flagging issues in the modules' examples/ — see the honest column below.

Table 3 — Recall corpora (raw HIGH counts)

Six additional corpora with deliberately insecure configurations. We have no path-level ground truth here beyond iam-vulnerable, so these are raw HIGH counts, not scored precision/recall — more is not automatically better, since a chunk of any tool's count is noise. Shown for completeness.

Corpus audytx CheckovTrivyKICSTerrascan
KaiMonkey40109112021
iam-role-chain49010
learn-terraform-provision-eks-cluster23300
sadcloud47201265358
terraform-aws-eks-blueprints29210DNF ⁴137
terragoat52466937035

⁴ Trivy timed out on eks-blueprints (5-min limit); Terrascan DNF on iam-vulnerable (Table 1).

How the precision gap happens — a worked example

The Table 2 result is not fewer rules — it's cross-resource reasoning. audytx pre-computes relationship graphs and suppresses findings that context proves benign, showing the rationale instead of dropping them silently. Here's the mechanism on one fixture (testbed #11), illustrative of why the clean-module counts diverge so far.

Serverless messaging — SQS DLQ chain, sync + polled-async Lambdas, TTL'd DynamoDB

Single-resource scanners flag each resource against a checklist. audytx reads how the resources connect first.

Single-resource scanner

aws_lambda_function.chirp_api
Lambda DLQ missing
aws_lambda_function.chirp_outbox_worker
Lambda DLQ missing
aws_dynamodb_table.chirp_request_log
point-in-time recovery not enabled
aws_sqs_queue.chirp_outbox_dlq
queue has no DLQ of its own
fires on every resource that fails a pattern 4 noise findings

audytx — same resources, with context

aws_lambda_function.chirp_api
suppressed — sync via API Gateway; a Lambda DLQ only fires on async invokes, so it would never receive an event
aws_lambda_function.chirp_outbox_worker
suppressed — polled-async via SQS event-source mapping; failures handled by the queue's redrive_policy, not a function DLQ
aws_dynamodb_table.chirp_request_log
suppressed — TTL configured for ephemeral request logs; PITR is mismatched for data that self-expires
aws_sqs_queue.chirp_outbox_dlq
suppressed — this queue is the dead-letter queue; requiring it to have its own DLQ is infinite regress
each suppression carries its reasoning 0 noise findings

Multiply this across DLQ identity, Lambda invocation graphs, encryption variants, data lifetime, network exposure, IAM trust/policy reachability, tag environment and IMDSv2 inheritance — 17 reasoning axes in the live engine — and you get the 48-vs-1,193 gap in Table 2.

Methodology

Everything needed to reproduce the run, exactly as it was performed.

ToolVersionHow it was run
audytx0.14.6Live GitHub App scan → Code Scanning SARIF (version-verified)
Checkov3.2.520pip install · checkov -d <dir> --framework terraform -o json
Trivy0.71.0trivy config <dir> --severity HIGH,CRITICAL
KICS2.1.20kics scan -p <dir> -t Terraform
Terrascan1.19.9terrascan scan -i terraform -d <dir>

Corpus: 28 AWS-Terraform repos/modules in the public audytx-testbed, each on a bench/<name> branch. 5-minute timeout per tool per corpus (timeouts = DNF). No suppression files for any tool. Scoring is scripts/score.py (Python-3 stdlib only) — given the same inputs it produces a byte-identical scorecard every run. TP matching: a finding counts once per ground-truth path if its file and resource/rule reference that path; extra findings on the same path count as FP (conservative for audytx).

Where audytx is weaker — the honest column

This is a benchmark, not a sales sheet. The places audytx loses:

  1. Checkov has more raw coverage. Many legitimate Checkov findings (X-Ray tracing, code signing, function-in-VPC, reserved concurrency, TLS-version pinning) are real concerns audytx does not yet flag. If you want breadth-first "tell me everything potentially wrong," Checkov has more rules. audytx's catalog is a curated set focused on patterns it understands deeply enough to reason about — depth over breadth, by design.
  2. The IAM precision number is honest, not flattering. On the deliberately-vulnerable iam-vulnerable corpus, audytx fires 135 HIGH findings for 31 paths. Even accounting for legitimate secondary detections and the corpus's own FP-test fixtures, that is a lot of alerts — appropriate for a corpus that is wall-to-wall privesc, but it's not a "low volume" story there. The low-volume story is Table 2.
  3. Clean-module false positives: 27 (v0.4.1) → 33 (v0.14.8). Investigating that rise surfaced a real bug — AWS_OPS_010 (public Lambda Function URL) had inverted match-logic and fired on phantom module-synthesized URLs; fixed in v0.14.8, which removed 15 of them. The remaining handful are new rules (EKS secrets-encryption, deprecated Lambda runtimes) correctly flagging real issues in the modules' own examples/ code — not bogus matches. Net, audytx is again the lowest-false-positive tool of the five (33), ~36× below Checkov.
  4. The ground truth was authored by us. The iam-vulnerable path list follows directly from BishopFox's upstream docs and audytx rules were not tuned against it, but it is our scoring file. The unmatched-findings audit is published for independent checking.
  5. Single run, AWS-only. Each tool was scanned once (Terrascan in particular shows timeout variance on large corpora), and the whole corpus is AWS Terraform — this says nothing about multi-cloud or CloudFormation, which audytx deliberately does not cover.

Reproduce it yourself

The corpus, the ground truth, and the scorer are public. You do not have to take our numbers on faith.

# 1. Clone the public benchmark corpus
git clone https://github.com/victorsinha/audytx-testbed
cd audytx-testbed

# 2. Run any competitor on a corpus (example: Checkov on a clean module)
checkov -d corpus/terraform-aws-rds --framework terraform -o json | jq '.summary'

# 3. audytx numbers come from the live Code Scanning SARIF on each bench branch
gh api "repos/victorsinha/audytx-testbed/code-scanning/analyses?ref=refs/heads/bench/terraform-aws-rds" \
  --jq '[.[] | select(.tool.name=="audytx")] | sort_by(.created_at) | last | .id'

# 4. Re-score everything deterministically
python3 scripts/score.py results ground-truth

Full write-up — every table, footnote and caveat — lives in docs/benchmark-v1.md in the engine repo.

audytx vs Checkov

audytx vs Checkov: AWS Terraform security scanner comparison

The short version: both detect 100% of documented IAM privilege-escalation paths. audytx fires 36× fewer false positives on clean production modules (33 vs 1,193). Checkov has more raw rule coverage. For teams whose scanner is muted because of noise, audytx's precision matters more — for teams that want maximum breadth and tolerate triage work, Checkov delivers more rules.

Choose audytx when

  • Your team has turned off or started ignoring another scanner due to alert fatigue
  • You need 100% IAM privesc recall AND low noise (audytx is the only tool that delivers both)
  • You want the reasoning behind each suppressed finding — not just a pass/fail
  • You use AI coding agents and need an MCP server for pre-PR checks
  • You want free PR comments without a Bridgecrew account or API key

Choose Checkov when

  • You need maximum rule breadth: X-Ray tracing, code signing, function-in-VPC, TLS version pinning, reserved concurrency — Checkov has these, audytx doesn't yet
  • You're already on the Bridgecrew/Prisma Cloud platform and want native integration
  • You run multi-cloud or CloudFormation (audytx is AWS + Terraform only by design)
  • You want a broad "tell me everything possibly wrong" sweep rather than a high-precision review
Key numbers: IAM privesc recall — audytx 100%, Checkov 100%. False positives on 21 clean modules — audytx 33, Checkov 1,193 (36×). IAM precision — audytx 23%, Checkov 12% (2× at the same recall). Alert volume on iam-vulnerable — audytx 135, Checkov 269 (half). Full data in Tables 1–2 above.
audytx vs Trivy

audytx vs Trivy: Terraform IaC scanner comparison

Trivy is a multi-purpose security scanner (containers, images, SBOMs, IaC). Its Terraform coverage focuses on common misconfigurations and has 0% recall on IAM privilege-escalation paths — Trivy fires 7 HIGH findings on iam-vulnerable, none of which are correct privilege-escalation detections. audytx has 5× fewer false positives on clean modules (33 vs 175) while detecting all 31 IAM privesc paths Trivy misses entirely.

Choose audytx when

  • IAM security is a priority — Trivy has no IAM attack-path detection
  • You want cross-resource reasoning and context-aware suppression
  • You're Terraform-on-AWS focused and want depth over breadth
  • You need MCP server integration for AI coding agents

Choose Trivy when

  • You need a single tool covering containers, images, SBOMs, and IaC together
  • You scan multiple cloud providers or CloudFormation (Trivy supports both)
  • You want Kubernetes manifest and Helm chart scanning alongside Terraform
  • You need offline / airgapped scanning with self-contained binaries
Key numbers: IAM privesc recall — audytx 100%, Trivy 0%. False positives on 21 clean modules — audytx 33, Trivy 175 (5×). Trivy fires 7 HIGH findings on iam-vulnerable; all 7 are false positives (0 true positives).
audytx vs KICS

audytx vs KICS: Terraform security tool comparison

KICS (Keeping Infrastructure as Code Secure, by Checkmarx) scores close to audytx on clean-module false positives (34 vs 33) — but reaches that only by detecting 3% of IAM privilege-escalation paths (1 of 31) versus audytx's 100%. KICS trades recall for precision. audytx achieves both: the lowest false-positive count and full IAM attack-path coverage.

Choose audytx when

  • You need both low false positives AND comprehensive IAM privesc detection
  • You want each suppressed finding shown with its rationale — not just fewer alerts
  • You want a GitHub App (install in 60s, no CI step) instead of a CLI tool
  • MCP server support for AI coding agents matters

Choose KICS when

  • You need multi-cloud support: KICS covers Azure, GCP, Kubernetes, Docker, Ansible, CloudFormation
  • You're already on the Checkmarx platform for SAST and want a unified tool
  • IAM attack-path detection is not a priority and low alert volume is
  • You prefer a self-hosted CLI with no external calls
Key numbers: IAM privesc recall — audytx 100%, KICS 3% (1 of 31 paths). False positives on 21 clean modules — audytx 33, KICS 34 (essentially tied). KICS achieves low noise by missing nearly all IAM attack paths; audytx achieves both.
audytx vs Terrascan

audytx vs Terrascan: Terraform static analysis comparison

Terrascan (by Tenable) timed out on the iam-vulnerable corpus (5-minute limit exceeded) and produced 454 false positives on 21 clean modules — 14× more than audytx. A single module (terraform-aws-s3-bucket) triggered 348 HIGH alerts in a mass-rule blowup, accounting for 77% of Terrascan's total clean-module count.

Choose audytx when

  • Scan time reliability matters — Terrascan timed out on large module graphs
  • You need IAM privilege-escalation path detection (Terrascan DNF'd on this corpus)
  • 14× lower false-positive volume is a meaningful team-productivity gain
  • You want a GitHub App with PR comments and SARIF upload, not a local CLI

Consider Terrascan when

  • You need broad multi-cloud coverage (Azure, GCP, Docker, Kubernetes) alongside Terraform
  • You want OPA-based custom policies with Rego
  • You're on the Tenable platform and want native integration
  • You need airgapped / fully self-hosted scanning
Key numbers: IAM privesc recall — audytx 100%, Terrascan DNF (timeout on iam-vulnerable corpus). False positives on 21 clean modules — audytx 33, Terrascan 454 (14×). Terrascan's 348 alerts on a single module suggest a rule-volume issue on larger module graphs.

See your own numbers

One click to install. Free on every repo, public or private — there's no plan to choose.

Install audytx →