METHODOLOGY // 2026-04-08

The evidence ladder.

What separates a hunch from a finding from an attestable signed conclusion. The Labs methodology for evidence weight, demystified.

Why a ladder

When Labs writes a finding for a Board-Ready engagement, the finding sits at a specific rung on what we call the evidence ladder. The rung determines what the deliverable claims, what the remediation runbook recommends, and what the attestation signs.

The discipline of the ladder is the answer to one question a buyer's CISO will always ask: "what do you actually know, and how do you know it?" A vendor that cannot place each finding on the ladder is selling reassurance, not evidence. Reassurance is a category that exists in the cybersecurity market. It is not what Labs sells.

The five rungs

Rung 1 — Observation. A factual statement about what is publicly readable. "This image tag is public and contains a .env.production recoverable at layer 4." No interpretation. No claim of impact. The artefact is included in the evidence pack so a third party can verify.

Rung 2 — Pattern recognition. A statement that the observation matches a class Labs has documented. "The credential format AKIA[A-Z0-9]{16} is consistent with an active AWS access key prefix." Pattern recognition is a probabilistic claim. It does not assert the credential is currently valid.

Rung 3 — Correlated inference. Two or more independent observations that point at the same impact hypothesis. "The credential on Rung 2 appears in a workflow file on a public repository under the same organisation, used by a deployment job that references an IAM role with iam:CreateAccessKey." This rung claims the surfaces are connected. It does not test the connection.

Rung 4 — Documented impact, no exploitation. A reasoned argument, based on declared infrastructure and read-only enumeration of public metadata, for what the impact would be if the connection on Rung 3 is real. "If the credential is active and the role declaration in the workflow is accurate, the principal can read S3, create derivative keys, and pivot to EC2 metadata across three accounts." No probe is performed. The argument is constructed from what is publicly declared, not from what is tested.

Rung 5 — Validated impact, controlled. An assertion produced by an explicitly authorised, scoped, and contractually bounded test performed by a partner with the right legal frame. Labs does NOT perform Rung 5 itself. We can include Rung 5 findings in a deliverable if a third party with the proper authorisation contributes them — but the boundary is preserved: the founder is the auditor of record for everything Rung 1 through Rung 4, and the Rung 5 contributor is named.

Why we stop at Rung 4

The discipline of stopping at Rung 4 is the most-questioned aspect of the Labs methodology. The argument from prospects is "if you could just confirm the credential works, the finding would be definitive."

The argument is correct technically. It is wrong commercially and ethically.

Technically. A sts:GetCallerIdentity call would confirm the credential. The call is detectable and logs in the client's account.

Commercially. Labs is not contracted for active testing. The CISO who hires Labs hires us specifically for the discipline of not testing. The buyer wants to know what an attacker can read without authorisation. They do not want us to add an entry to their audit trail. They want us to be invisible.

Ethically. A consulting firm that tests credentials it finds in public artefacts, even legitimately-found credentials of its own client, has crossed the line between observation and intrusion. The legal frame for "the credential was public so testing it was fine" is contested in every jurisdiction Labs operates in. The reputational frame is uncontested: an audit firm that tests does not survive its first ethics audit. Labs intends to survive.

What the buyer gets at each rung

The rung determines the deliverable language.

Rung 1 finding is documented as "Observed: <artefact> is publicly readable." The runbook says "Remove or rotate the artefact."
Rung 2 finding is documented as "Pattern match: <artefact> contains a value consistent with an active credential of type <class>." The runbook says "Rotate the candidate credential and audit recent activity for the corresponding principal."
Rung 3 finding is documented as "Correlated: <artefact-A> references the same credential class declared in <artefact-B>. The chain is consistent with an active deployment path." The runbook addresses the chain end-to-end.
Rung 4 finding is documented as a Proof-of-Threat card with the reasoned impact statement, the public references that support it, and an explicit "no probe was performed" disclaimer.
Rung 5 finding is documented with the partner's identity, the scope of authorisation, and the test result. Labs co-signs with the partner; we never sign alone.

How the AI-Alliance fits

The AI-Alliance Challenging protocol — described in detail in another article — runs at the boundary between rungs. The most common adjudication is "is this finding Rung 3 or Rung 4." The four models argue. The founder decides. The decision is recorded in the Challenge Log.

Promoting a Rung 3 to Rung 4 increases the impact statement the deliverable can make. Demoting a Rung 4 to Rung 3 reduces it. Either is acceptable. What is not acceptable is to claim Rung 4 without the corresponding reasoned argument, or to publish Rung 5 without a co-signature.

What this looks like in the deliverable

A Board-Ready deliverable has six documents, one of which is the Findings Detail. Each finding in the Findings Detail begins with its rung label. The rung is not buried; it is the first piece of metadata after the title.

A buyer reading the deliverable can sort by rung, decide which findings need immediate remediation versus monitoring, and audit the rung assignment by reading the supporting evidence in the appendix. The rung is not an opinion. It is a defended classification.

Why this discipline is rare

Most consulting reports in the market do not use a ladder. They blend Rung 2, Rung 3, and Rung 4 statements together in prose, with a severity score (CVSS) substituting for evidence weight. The CISO is left to reconstruct what is observation versus inference.

The reason this is common is that CVSS-style scoring is easier to deliver and easier to digest. The reason it is dangerous is that a CVSS-9 with weak evidence and a CVSS-7 with strong evidence look interchangeable on the cover page. The remediation prioritisation goes wrong, and the post-incident analysis is harder.

Labs does not deliver CVSS scores in the cover page. Labs delivers rungs. The CVSS is in the appendix for buyers who need it for downstream tooling. The cover page leads with evidence weight.

This is a deliberate publication choice. It is uncomfortable for buyers who expected a familiar format. It is the right choice for buyers who hired Labs specifically because the format does not exist elsewhere.

AUTHOR

BleedWatch Labs founder

Founder-led research from the same auditor of record who signs Labs engagements. Specific client references and prior research identifiers are shared under reciprocal NDA when relevant.

Share on LinkedIn Discuss the pattern

NEXT STEP

If this resonated, book a discovery call.

Book a discovery call