CVE Hunter is a static-analysis CLI. In blind runs against open-source codebases it has surfaced exploitable bugs that Semgrep and CodeQL did not flag and that frontier models, prompted directly, failed to find. Every finding ships with five labelled evidence ranks, a held-out judge, and a Sigstore-signed provenance triple covering prompt, model, and container — so a reviewer can tell what is real from what is a guess.
Preprocessing is deterministic and contains no LLM calls. The hypothesis queue turns a repo into a bounded set of specific questions instead of an open-ended "find a bug." Specialists answer one question each, in fresh context. The ensemble votes; the judge grades. Evidence is labelled. The reporter emits.
A finding is labelled confirmed only if it carries rank 3 alongside one of rank 1, 2, or 4. Everything else is labelled candidate. Candidates are not suppressed; they are surfaced with the evidence they actually have, so a reviewer can decide.
A deterministic CPG or CodeQL taint path from attacker-controllable source to sink.
An existing OSS-Fuzz, syzbot, or upstream-CI artefact that already reproduces the bug.
Three model families vote. A fourth family grades. Collapse detector catches correlated rationales.
A rule implied by tests, fixtures, or comments that this code provably violates.
A fine-tuned per-CWE classifier confirms the specialist's candidate finding.
A finding records CWE, severity, the reachability path, the stack of evidence behind it, and a signed provenance triple covering prompt, model, and container. The schema is versioned; changes are additive.
Rule-based SAST flags pattern matches, not exploitability. Agentic scanners produce per-finding confidence percentages with no underlying evidence taxonomy. In our blind runs, both classes missed bugs that turned out to be real and exploitable. CVE Hunter labels the evidence so a reviewer can tell the difference.
| Tool | Primary output | Confidence signal | Limitation |
|---|---|---|---|
| SAST / Semgrep / Joern | Rule-matched findings | Rule precision stats | Rule coverage + FP noise; weak reasoning on novel patterns |
| CodeQL | Dataflow paths | Query-level precision | Query-authoring burden; offline from framework context |
| Agentic scanners | Narrative findings | Per-model confidence % | Hallucinations collapse under correlated rationales |
| Dependabot / Snyk | CVE advisories | CVSS | Source-level bugs invisible; advisory-first, not code-first |
| cve-hunter | Evidence-labelled findings | Rank 1–5 + confirmed gate | Tier-1 default; Tier-2 Go (§70); Tier-3 C/C++ (§76, memory safety, separate scorecard) |
What you get when you docker run.
One Docker invocation per repo. The target mount is read-only. The state mount holds policy, runs, cache, and models across invocations.
# scan a repo docker run --rm \ -v "$PWD:/target:ro" \ -v "$PWD/.cve-hunter:/state" \ -e ANTHROPIC_API_KEY \ -e OPENAI_API_KEY \ -e GOOGLE_API_KEY \ ghcr.io/ciaran-finnegan/cve-hunter:latest \ scan /target --out /state/runs # verify a single finding cveh verify run.a7f2 --finding "f_03b1" # re-export to SARIF only cveh export run.a7f2 --format sarif # host readiness check cveh doctor
Design-partner cohort is open. Bring a repo under 500k LOC. We run a blind scan, walk you through the labelled findings, and you keep the report.
25+ years across security, software engineering, banking, and consulting. Credited with CVE-2025-30755 in Oracle OpenGrok. Speaks at Australian CISO and CNCF events on cloud-native and AI-assisted security.
Cyber security principal at Mantel Group. Previously Principal Engineer at DigIO and security engineer in ING Group's firewall team. Certified Cloud Security Professional. Public technical contributor on Open Policy Agent, GKE security, and Kubernetes event-stream architecture.
Principal cyber security consultant at Mantel Group, Melbourne. Prior security and engineering roles at Servian, MODUS Security, Unico Computer Systems, and SecurePay.