01 // METHODOLOGY

How we work.

An adversarial discovery engagement, validated by a four-model AI-Alliance, signed off by a single human auditor.

02 // ENGAGEMENT FLOW

From scope to attestation in five business days.

The flow is deliberately short. Labs is not a months-long red-team program or a compliance readiness retainer. It is a focused external-surface assurance engagement that moves from authorization to evidence, remediation, retest, and signed closure.

Scope agreement

T-0

Adversarial discovery

T+1 to T+3

AI-Alliance Challenge

T+3

Remediation drafting

T+3 to T+4

Report assembly

T+4

Delivery + retest

T+5

30-min call. We agree on the surface, sign mutual NDA, and set written authorization for OSINT-only discovery.

Founder runs OSINT against public Docker registries, NPM packages, GitHub artifacts, CI/CD logs, and adjacent public sources.

Each candidate finding is independently re-evaluated by Claude, Gemini, Codex, and Mimo against raw evidence.

The founder authors exact remediation steps. The same alliance challenges whether the fix closes the exposure.

Executive summary, evidence pack, business impact analysis, remediation runbook, and challenge log are bound.

Founder reviews the report with you. After fixes, we retest within 14 days and issue signed attestation.

03 // SCOPE MODEL

The surface is written down before discovery starts.

A clean scope protects both sides. It tells your legal team what was authorized. It tells the auditor where to stop. It tells the report reader what the attestation does and does not cover.

Root domain

The canonical company domain and explicitly related domains named in the statement of work.

GitHub organization

Public repositories, releases, Actions artifacts that remain public, and organization-level metadata.

Docker namespace

Public image tags, manifests, layers, build metadata, and registry naming patterns.

NPM scope

Public packages, tarballs, package metadata, install scripts, and accidental bundled configuration.

Adjacent public artifacts

CT logs, archived pages, public package mirrors, exposed docs, and historical public records.

04 // AI-ALLIANCE CHALLENGING

Four models. Independent context. Forced convergence.

Each frontier LLM has a different blind spot. Run them independently against the same evidence — the disagreements are signal.

For each finding, we run a structured three-pass protocol: independent judgment, steel-man pass, and documented founder decision if non-convergence persists.

The same protocol applies to remediation. Each model proposes a fix; each fix is challenged by the others; the simplest convergent fix wins.

Convergence is not majority vote. A 3-1 disagreement on severity or scope is documented; a 4-0 false-positive verdict kills the finding.

[finding F-2026-12, candidate verdict: CRITICAL]
claude   verdict=CRITICAL conf=0.92  "AWS_ACCESS_KEY in /etc layer, prod-tag, scope=*"
gemini   verdict=CRITICAL conf=0.88  "concur; suggest verify via STS"
codex    verdict=HIGH     conf=0.80  "scope wildcard ambiguous"
mimo     verdict=CRITICAL conf=0.85  "concur; impact = full S3+IAM"

[steel-man pass]
codex defends HIGH: "no production telemetry visible"
claude rebuts:      "tag 'prod' + manifest digest matches CI"
verdict converges:  CRITICAL conf=0.91

Claude

Strong at narrative coherence and impact synthesis. We challenge it for confirmation bias and over-complete explanations.

Gemini

Strong at broad pattern recognition. We challenge it for regex-driven false positives and over-reading weak signals.

Codex

Strong at implementation detail and remediation diffs. We challenge it for under-rating business impact outside code paths.

Mimo

Strong as a stabilizing reviewer. We challenge it for averaging toward consensus when the outlier might be right.

05 // EVIDENCE LADDER

Every finding is rated on its evidence.

We only ship findings at level 4 or 5 in the executive summary. Lower-level signals are documented in the appendix as watch list items and not actioned.

Suspicion

Pattern matched in public artifact, not yet challenged.

Static corroboration

Pattern, entropy, variable name, and structural context align.

Context-validated

At least one frontier LLM has reviewed the artifact and confirmed.

Cross-validated

The AI-Alliance has converged on verdict.

Externally grounded

Independently verified against external truth source.

06 // DELIVERABLE

What you get.

01_Executive_Risk_Memo.pdf — 2 pages, board-ready

02_Findings_Detail.pdf — full finding-by-finding, with evidence pack

03_Business_Impact_Analysis.pdf — breach scenario, RGPD/NIS2/DORA exposure, monetary range

04_Remediation_Runbook.md — step-by-step, code-ready, version-controlled

05_AI_Alliance_Challenge_Log.json + .pdf — verbatim model-to-model challenges

06_Retest_Attestation_Template.pdf — signed on closure

Sample anonymized deliverable available on request after the discovery call.

Mock report preview.

PAGE 1

Cover memo

PAGE 2

Evidence pack

PAGE 3

Retest attestation

Every executive claim must trace to evidence in the appendix.

Every evidence item must include path, timestamp, hash, and collection context where available.

Every remediation step must be specific enough for an engineer to ship without a second discovery meeting.

Every AI-Alliance disagreement that changes severity, scope, or remediation must remain visible in the log.

Every closure attestation must reference the retest date, scoped surface, and residual limits.

07 // HARD LIMITS

What we will not do.

We never probe, exploit, or test credentials against client infrastructure. OSINT-only.
We never store, share, or republish discovered secrets. Hashes and truncated previews only.
We never scope-creep. The engagement contract specifies the surface.
We never publish a finding with the client's identity attached without explicit written permission.
We never withhold a finding to upsell. Every confirmed finding is in the report.

Credential handling

We do not test discovered credentials. We prove exposure by origin, context, structure, and safe corroboration.

No opportunistic expansion

If we find an adjacent asset that looks relevant but is not authorized, we document the scoping question instead of investigating it.

No public attribution

Client identity, sector, timing, and technical fingerprint stay confidential unless the client explicitly authorizes publication.

No scanner theater

We do not pad the report with low-confidence findings, dependency noise, or generic best-practice checklists.

08 // ACCEPTANCE CRITERIA

How findings move in or out of the report.

The report is not a dumping ground for every interesting signal. The founder uses explicit acceptance criteria so the final artifact stays short, defensible, and useful to engineering teams.

Finding accepted

The evidence reaches level 4 or 5, the business impact is defensible, and remediation can be stated precisely.

Finding rejected

The AI-Alliance converges on false positive, the evidence cannot be reproduced safely, or the impact depends on an unproven assumption.

Finding downgraded

The exposure exists, but scope, privilege, or reachable impact does not support executive-summary severity.

Watch-list item

The signal is useful for future monitoring but too weak to ask engineers to remediate under the current engagement.

Closure accepted

Retest shows the public exposure path is gone, and the attestation names the exact surface and date checked.

Closure deferred

If the client needs more time to remediate, the report records the remaining exposure and the retest window.

Ready to talk scope?

Book a 30-min discovery call

Every engagement signed by the founder. BleedWatch Labs