METHODOLOGY // 2026-03-25
The redacted Challenge Log.
What a Labs deliverable publishes about its own reasoning — and what stays inside the engagement.
What a Challenge Log is
Every finding in a Labs Board-Ready deliverable carries a Challenge Log appendix. The Challenge Log is the written record of the AI-Alliance Challenging protocol applied to that finding: the four independent verdicts, the steel-manned counter-arguments, the founder's adjudication, and the remediation challenge round.
This is the artefact a CISO uses to defend the audit internally. It is also the artefact that most distinguishes a Labs deliverable from a conventional consulting report. The structure exists because the buyer asked, repeatedly across early engagements, the same question: "how do I know your reasoning is sound, and not just your conclusion."
What is in a published Challenge Log
A redacted Challenge Log entry, as it appears in a deliverable, contains the following sections.
Finding handle and rung. The finding has an internal handle (e.g. F-04) and an evidence-ladder rung label. The buyer can cross-reference to the main Findings Detail.
Original verdicts (four). Each verdict block contains:
- The model identifier (a stable but model-version-agnostic label, e.g.
auditor-1throughauditor-4) - The verdict —
confirms,partial,rejects - The model's reasoning in its own words, preserved verbatim except for client-identifying redactions
Steel-manned counter-arguments (four). Each counter-argument block contains the best version of the opposing argument that the model was asked to produce. The point is to show that the disagreement was fair — that the dissenting view was given its strongest articulation, not its weakest.
Founder adjudication. The founder's written decision, with the specific reason for which argument won. This is signed by name in the deliverable's signature block.
Remediation challenge round. The list of attacks the same four models produced against the proposed remediation runbook, and the iterations the runbook went through before the attacks became weak.
Retest verdict (added at retest time). After remediation is applied, the same four models read the closed surface and produce a retest verdict. If all four agree the surface is closed for the relevant class, the attestation is issued.
What is NOT in a published Challenge Log
Three categories are deliberately excluded from the published artefact.
Client-identifying artefacts. The original verdicts and counter-arguments are redacted at the point of any reference that would identify the engagement: artefact URLs are abstracted to type-and-pattern, organisation-specific naming is replaced with placeholders, and any reference to the buyer's internal structure (team names, role hierarchies, deployment patterns) is normalised. The redaction preserves the technical content; it removes the attribution.
Model prompts. The exact prompts used to elicit verdicts, steel-manning, and remediation challenges are not in the deliverable. They are a methodology trade secret of Labs. The buyer can inspect their effect — the Challenge Log itself — but not the input that produced it. This is a deliberate publication choice; a competitor could replicate the protocol's shape from the published artefact, but would have to develop their own prompts.
Operational metrics. Token usage, model wall-clock time, and the specific model versions used are not in the deliverable. The buyer is buying the verdict, not the operational economics. If the buyer needs the economics for downstream tooling, Labs publishes a separate disclosure on /trust.
Why the redaction is principled, not strategic
A reasonable challenge from a prospect is: "the redaction is convenient — how do I know the unredacted version is consistent with the published one."
Two answers:
1. Reciprocal-NDA inspection is available. A prospect under reciprocal NDA may inspect the unredacted Challenge Log of a prior engagement, with the prior client's consent. The consent process is explicit; the founder asks the prior client before inviting the inspection. Two prior clients to date have agreed to this; the contact path is available to qualified buyers during scoping.
2. The redaction rules are published. The redaction transformation — what is removed, what is kept, what is replaced with a normalised token — is documented as a doctrine on /trust. The buyer can audit the redaction rules in advance and judge whether they preserve the technical content they need.
Why a published log at all
The argument for publishing nothing is straightforward: do not give the next attacker the playbook. Most consulting firms in this space lean toward "publish less."
Labs leans the other way. The reasoning:
Trust requires legibility. A buyer cannot evaluate the discipline of an audit firm by reading the firm's marketing site. They evaluate it by reading the firm's working artefacts. If no working artefact is public, the buyer is buying brand, not method. Labs is not (yet) a brand the market trusts on faith. It must be a method the market can read.
The redaction is the discipline. The fact that the Challenge Log is publishable in a useful form, without leaking client identity, is itself a methodology proof. It shows the engagement separated finding from client-attribution at the right granularity. A firm that cannot publish a redacted Challenge Log without disclosing client details has a methodology problem, regardless of what it claims.
The asymmetry is real. Adversaries reading the Challenge Log learn what Labs looks for. Defenders reading it learn the same thing. The defender benefit is larger because there are more defenders than attackers and because the published patterns are already known to the serious attacker. The marginal information advantage to the adversary is small. The marginal information advantage to the buyer's CISO is large.
What a published sample looks like
A redacted Challenge Log sample is hosted on /proof. The sample is from a Discovery Brief engagement that the prior buyer agreed to publish in normalised form. Reading it should take ten minutes. After reading, the prospect should be able to answer:
- What was the finding's rung on the evidence ladder?
- Which of the four models initially disagreed?
- What argument changed the adjudication?
- What remediation iteration survived the challenge round?
- What did the retest confirm?
If the published sample does not let a careful reader answer those five questions, the publication has failed. The standard is that high.
A note on AI-detection
A prospect's technical reviewer occasionally asks whether the Challenge Log is generated by an LLM at write-time. The answer is no.
The verdicts, steel-manned counter-arguments, and challenge attacks are produced by the four LLMs as part of the protocol. The adjudication is written by the founder. The synthesis into a Challenge Log appendix — the editorial layer that selects what to include, redacts client-identifying material, and produces the published format — is performed by the founder.
The published artefact is a human document about an AI-assisted process. The reviewer who looks for the AI signature on the published page will find the AI's reasoning quoted verbatim in the verdict blocks. That is the point. The reviewer who looks for the AI signature on the adjudication will not find one. That is also the point.
AUTHOR
BleedWatch Labs founder
Founder-led research from the same auditor of record who signs Labs engagements. Specific client references and prior research identifiers are shared under reciprocal NDA when relevant.