CLI + Docker Evidence-ranked output SARIF 2.1 Sigstore-signed

Audit-first deep scan that finds the bugs your existing stack missed.

CVE Hunter is a static-analysis CLI. In blind runs against open-source codebases it has surfaced exploitable bugs that Semgrep and CodeQL did not flag and that frontier models, prompted directly, failed to find. Every finding ships with five labelled evidence ranks, a held-out judge, and a Sigstore-signed provenance triple covering prompt, model, and container — so a reviewer can tell what is real from what is a guess.

Tier-1: Java · .NET · Python · TS/JS Tier-3 (memory safety): C · C++ Single-tenant Docker, in-perimeter Held-out judge per finding
cveh scan · run.a7f2
● live
$ cveh scan ./target --ensemble portable › container=sha256:4a9e… prompt-hash=sha256:7c1b… › target=./target (py + js) allow-list: ok   [1/6] preprocess tornhill · joern-cpg · silent-fix · invariants done 14.2s [2/6] hypothesis 6 streams → 382 tickets done 3.1s [3/6] specialists 13 agents × fresh context, parallel done 2m 47s [4/6] ensemble 3 families vote · held-out judge done 41.8s [5/6] evidence ranks attached · tri-state reachability done 6.2s [6/6] report sarif · json · md · sigstore sign done 2.4s   14 findings · 6 confirmed · 8 candidate · 0 suppressed cost $3.81 (2.4m tokens · 412 tool calls) provenance signed → provenance.sig   $
BLIND RUNS
OSSrepos, multi-language
MISSED BY
SAST + LLMprompted directly
EVIDENCE
5labelled ranks
ENSEMBLE
3 + 1vote + held-out judge
§ 01 · PIPELINE

Six stages. One Docker binary.

Preprocessing is deterministic and contains no LLM calls. The hypothesis queue turns a repo into a bounded set of specific questions instead of an open-ended "find a bug." Specialists answer one question each, in fresh context. The ensemble votes; the judge grades. Evidence is labelled. The reporter emits.

01 no LLM

Preprocessor

  • Tornhill hotspots
  • Joern CPG + tags
  • Silent-fix miner
  • Invariant extract
02 6 streams

Hypothesis queue

  • ~400 ranked tickets
  • CWE + target + seed
  • priority = hotspot × base × gap
03 13 agents

Specialist swarm

  • 9 class agents
  • 4 framework agents
  • fresh context, one ticket
04 3 + 1

Ensemble + judge

  • 3 families vote
  • 4th family grades
  • collapse detector
05 ranks 1–5

Evidence

  • attach ranks 1–5
  • tri-state reachability
  • confirmed vs candidate
06 signed

Reporter

  • SARIF + JSON + MD
  • provenance triple
  • Sigstore signature
git repo ranked tickets structured votes graded findings labelled evidence signed artefacts
§ 02 · EVIDENCE

Five evidence ranks. One gate.

A finding is labelled confirmed only if it carries rank 3 alongside one of rank 1, 2, or 4. Everything else is labelled candidate. Candidates are not suppressed; they are surfaced with the evidence they actually have, so a reviewer can decide.

1

Reachability path

A deterministic CPG or CodeQL taint path from attacker-controllable source to sink.

Deterministic cpg-path.json · 14 nodes
2

Harvested crash

An existing OSS-Fuzz, syzbot, or upstream-CI artefact that already reproduces the bug.

Observed oss-fuzz/crashes/a7f…
3

Ensemble + judge

Three model families vote. A fourth family grades. Collapse detector catches correlated rationales.

Consensus opus-4 · gpt-5 · gemini-3 · judge:kimi
4

Structural invariant

A rule implied by tests, fixtures, or comments that this code provably violates.

Inferred invariant: auth_required on /admin/*
5

Validator agreement

A fine-tuned per-CWE classifier confirms the specialist's candidate finding.

Model validator-cwe416 · 98.1%
CONFIRMED / CANDIDATE GATE
confirmed rank(3) [ rank(1) rank(2) rank(4) ]
everything else → candidate · labelled · never silently suppressed
3 + 1 confirmed
3 + 4 confirmed
3 alone candidate
§ 03 · ANATOMY

What a finding looks like.

A finding records CWE, severity, the reachability path, the stack of evidence behind it, and a signed provenance triple covering prompt, model, and container. The schema is versioned; changes are additive.

Critical CWE-416 · Use-After-Free
1 3 5

Session token not invalidated on logout path.

src/auth/session.py:42–68 · validate_token()
1 reachability · cpg path, 14 nodes cpg-path.json
3 ensemble · 4 / 4 agree, judge grade A votes.json
5 validator · cwe-416, 98.1% confidence v.onnx
reachable confirmed internet-facing handles-pii
FIX GUIDANCE (NARRATIVE ONLY · UNTRUSTED INPUT)
__USER_INPUT__
PROVENANCE
prompt_hash    sha256:7c1b…ac48
generator      anthropic/claude-opus-4-7 · 2026-03-15
judge      openai/gpt-5 · 2026-03-10
container      sha256:4a9e…3f72 ● sigstore ✓
▪ CPG REACHABILITY · 14 NODES REACHABLE
SOURCE request.cookies["sid"] middleware auth.extract_token() decode jwt.unverified_decode(t) cache redis.get(session:) validate session.validate_token(t) SINK · use-after-free session.delete() then session.user_id ← line 64 source → middleware → decode → cache → validate → sink
§ 04 · NEIGHBOURS

Where existing tools fall short.

Rule-based SAST flags pattern matches, not exploitability. Agentic scanners produce per-finding confidence percentages with no underlying evidence taxonomy. In our blind runs, both classes missed bugs that turned out to be real and exploitable. CVE Hunter labels the evidence so a reviewer can tell the difference.

ToolPrimary outputConfidence signalLimitation
SAST / Semgrep / JoernRule-matched findingsRule precision statsRule coverage + FP noise; weak reasoning on novel patterns
CodeQLDataflow pathsQuery-level precisionQuery-authoring burden; offline from framework context
Agentic scannersNarrative findingsPer-model confidence %Hallucinations collapse under correlated rationales
Dependabot / SnykCVE advisoriesCVSSSource-level bugs invisible; advisory-first, not code-first
cve-hunterEvidence-labelled findingsRank 1–5 + confirmed gateTier-1 default; Tier-2 Go (§70); Tier-3 C/C++ (§76, memory safety, separate scorecard)
§ 05 · SPEC

Spec sheet.

What you get when you docker run.

LANGUAGES
Tier 1Java · .NET · Python · TS/JS
LANGUAGES
Tier 3C · C++ — memory-safety only, separate scorecard
FRAMEWORKS
4 specialistsSpring · ASP.NET · Django · Node
OUTPUT FORMATS
3SARIF 2.1 · JSON · Markdown
ENSEMBLE
Portable / Premier3 hosted + OSS  ·  +Glasswing
OFFLINE MODE
Yesself-host open-weights only
PROVENANCE
Triple-hashprompt · model · container
EXIT CODES
0 / 1 / 2stable contract
DEPLOYMENT
Single-tenantDocker on your perimeter
§ 06 · RUN

Run it like any other build step.

One Docker invocation per repo. The target mount is read-only. The state mount holds policy, runs, cache, and models across invocations.

# scan a repo
docker run --rm \
  -v "$PWD:/target:ro" \
  -v "$PWD/.cve-hunter:/state" \
  -e ANTHROPIC_API_KEY \
  -e OPENAI_API_KEY \
  -e GOOGLE_API_KEY \
  ghcr.io/ciaran-finnegan/cve-hunter:latest \
  scan /target --out /state/runs

# verify a single finding
cveh verify run.a7f2 --finding "f_03b1"

# re-export to SARIF only
cveh export run.a7f2 --format sarif

# host readiness check
cveh doctor
CONTRACT
Invocation
one docker run per repo
Target mount
/target:ro · read-only
State mount
/state · policy + runs + cache
Exit 0
no findings ≥ threshold
Exit 1
findings ≥ threshold
Exit 2
scan failed / budget breach
Schema
v1 · additive changes only
Open the web app Read about evidence ranks →
▪ DEV PREVIEW · APR 2026

Fewer findings. Better evidence.

Design-partner cohort is open. Bring a repo under 500k LOC. We run a blind scan, walk you through the labelled findings, and you keep the report.

Open the web app Read the docker-run contract See a sample finding →
Built by
Ciaran

25+ years across security, software engineering, banking, and consulting. Credited with CVE-2025-30755 in Oracle OpenGrok. Speaks at Australian CISO and CNCF events on cloud-native and AI-assisted security.

Michiel

Cyber security principal at Mantel Group. Previously Principal Engineer at DigIO and security engineer in ING Group's firewall team. Certified Cloud Security Professional. Public technical contributor on Open Policy Agent, GKE security, and Kubernetes event-stream architecture.

Mike

Principal cyber security consultant at Mantel Group, Melbourne. Prior security and engineering roles at Servian, MODUS Security, Unico Computer Systems, and SecurePay.