The cross-check protocol¶
At v0.4.0 the diagnostic-legibility agent adds Phase C, the
cross-check step that turns two individually-refined collections
into mutually-corrected ones. Where Phase B challenges each element
against its own evidence, Phase C uses each collection to challenge
the other.
This page covers the protocol design and why the agent ships in the shape it does at v0.4.0.
What Phase C does¶
After Phase B (per-element self-challenge) completes, the agent re-frames itself once more and runs Phase C:
- Precondition check. If one collection is empty (the
(empty scope)sentinel on one side; the other side populated), skip Phase C and set the model-levelcross_check_statusfield toskipped_asymmetric. The populated collection is still individually refined; Phase C does not run for asymmetric inputs at v0.4.0. - A→D direction. The architectural collection is the subject; the domain collection is the challenger. Iterate architectural elements in YAML order. For each subject, apply the five cross-check questions with CC1 (boundary contradiction) weighted heavily.
- D→A direction. The domain collection is the subject; the architectural collection is the challenger. Iterate domain elements in YAML order. For each subject, apply the five cross-check questions with CC5 (mutual description integrity) weighted heavily.
- Subject-only audit trail. A
CC<N>entry is written on the subject element only. Side-effects on sibling elements are named in the subject's prose body, not appended as duplicate CC entries on the side-effect element. - Emit-time ordering self-verification. Before serialising, the
agent verifies that every element's
challenge_notes[]hasQ<N>entries ordered beforeCC<N>entries; re-orders in place if needed. - Set wrapper status.
cross_check_status: completedif Phase C ran on both collections;skipped_asymmetricif step 1 triggered.
The five cross-check questions¶
| # | Name | Catches | Heavy in |
|---|---|---|---|
| CC1 | Boundary contradiction | The subject's description assumes a boundary the other collection contradicts | A→D |
| CC2 | Evidence overlap | Two elements cite the same evidence file but describe contradictory things | — |
| CC3 | Cross-confounders | An element in the other collection looks similar by name but is semantically distinct | — |
| CC4 | Cross-confidence calibration | The subject's confidence is miscalibrated against the other collection's evidence base |
— |
| CC5 | Mutual description integrity | The subject silently assumes something the other collection defines differently | D→A |
Each cross-check question targets a cross-collection failure mode — an error that single-collection Phase B self-challenge cannot catch because it requires reading both collections together.
Direction-specific failure modes¶
The dimension-flavoured weighting (CC1 in A→D, CC5 in D→A) targets two named cross-collection failure modes:
A→D direction (CC1 weighted) — architectural-implicit assumption in domain description¶
A domain element whose description implicitly assumes architectural behaviours the architectural collection does not commit to.
Example. The domain element Credential is drafted as "validated
through the AuthenticationService's issuance pipeline." The
architectural collection's AuthenticationService element describes
the service's responsibilities and explicitly names credential
validation and session issuance — but does not name an "issuance
pipeline" as a structural element. Phase B challenging Credential
against Credential's own evidence (the codebase paths citing
Credential) would not surface this; only A→D cross-check, with the
architectural collection as challenger, can see that the domain
element's description implies architecture the architecture does
not commit to.
D→A direction (CC5 weighted) — domain-concept smear in architectural element¶
An architectural element whose description silently conflates infrastructure with domain meaning that the domain collection explicitly defines.
Example. The architectural element SessionStore is drafted as
"stores user sessions." The domain collection's Session element
is explicit that a Session is the authenticated artefact
returned after successful credential validation — not the raw
storage record. The architectural element has smeared the storage
shape with the domain term. Phase B challenging SessionStore
against its own evidence (the storage code paths) would not surface
this; only D→A cross-check, with the domain collection as
challenger, can see that the architectural description silently
conflates the two layers.
Why the schema gained a wrapper field¶
The original draft of S3 recorded the asymmetric-input case (Phase C skipped because one collection was empty) by appending a per-element sentinel to every element of the populated collection. The diaboli surfaced two issues with this:
- The sibling sentinel (
Cross-check skipped; only one collection present) differed from the clean-run sentinel (Cross-check applied; no questions surfaced changes) only in one verb; downstream prefix-matching consumers would conflate them. - The fact recorded — cross-check could not run on this model — is a property of the whole model, not of any element. Recording it N times at element granularity is one fact per element for a one-fact-per-model situation.
The post-diaboli spec adopts a granularity-routing discipline:
- Per-element facts (vary across elements) go through the
challenge_notes[]string-prefix convention (Q<N>,CC<N>, the(empty scope)element-name sentinel, theChallenge applied; ...andCross-check applied; ...per-element clean-run sentinels). - Model-level facts (apply to the whole record) go in an
additive wrapper field on
LegibilityModel. Phase C's outcome is the first such field:cross_check_statuswith three legal values (completed,skipped_asymmetric,not_run).
The field is additive — v0.3.0 outputs without the field are valid
against v0.4.0 consumers, with field-absence semantically meaning
not_run. The discipline is recorded as a paired-promoted
cartographer story (Stories #1 + #4, follow-up issue
#347)
and is expected to govern future schema decisions across the
diagnostic-legibility plugin and its siblings.
Subject-only audit trail¶
When a cross-check critique on subject X surfaces a side-effect
revision on sibling Y, the CC<N> entry is written on X only.
The side-effect on Y is named in X's prose body (e.g. "CC1
(boundary contradiction): clarified that AuthenticationService
handles session issuance only; surfaced a corresponding tweak to
Credential's description in the domain collection.").
The rule preserves the single-writer invariant: every CC entry has
exactly one author and exactly one subject. The audit trail becomes
a graph rooted at subjects — to know why Y's description changed
in a cross-check pass, a consumer follows back-references from X's
CC prose body rather than reading Y's challenge_notes[] in
isolation. This costs the downstream /diagnose rendering layer
some work but keeps the contract coherent (cartographer Story #3).
The mode-marker contract¶
Two modes ship at v0.4.0:
mode: full(default if nomode:line is given) — Phase A + Phase B + Phase C. This is the superset of v0.3.0 behaviour; v0.3.0 dispatchers get cross-check as a free upgrade.mode: cross-check-only— Phase C only, against a fenced YAML payload (a previously-emittedLegibilityModelin a```yaml...```block in the prompt body). The agent skips Phase A and Phase B and runs cross-check against the supplied YAML.
An earlier draft included mode: construct-only (Phase A + Phase B
only, the v0.3.0 behaviour exactly). The diaboli surfaced that no
named consumer existed for this mode — mode: full already preserves
v0.3.0 behaviour as a superset. The mode was dropped at adjudication
and can be re-added in the same PR as its first named consumer
(cartographer Story #2 / diaboli O2).
Structured refusal contract¶
When an input violates a precondition, the agent emits a structured refusal line and no YAML block:
Programmatic dispatchers pattern-match on "no YAML block + presence
of diagnostic-legibility refusal:" to route to error handling.
There is no silent fallback anywhere in the agent's protocol. A
mistyped mode value, an unfenced YAML payload in cross-check-only,
an unsubstituted <DISPATCHER: ...> placeholder, or an unrevised
input element with empty challenge_notes[] all trigger refusal.
This is the third time the project has reached this design point —
choice-cartographer's structured cartograph_pending_count over
prose narration, this spec's refusal on unrecognised modes, this
spec's unified precondition table. Promoted to a cartographer story
(#6, follow-up issue
#348)
as a project-wide convention candidate.
Two-layer ordering enforcement¶
The contract that CC<N> entries follow Q<N> entries in every
element's challenge_notes[] is enforced at two layers:
- Emit-time self-verification. Step 6 of the cross-check algorithm — the agent verifies the ordering invariant on every element before serialising, and re-orders in place if needed.
- Fixture-based structural test. A test in
tdad_tests/tests/test_diagnostic_legibility_structural.pyloads a deliberately-interleaved input and asserts the re-ordering produces canonical ordering.
Both layers verify the same invariant; neither subsumes the other. The pairing is defence-in-depth applied to contracts (cartographer Story #7).
One honest qualification: the structural-layer test asserts the invariant against a canonical-ordering definition expressed in test code, not against the agent's own emit-time re-ordering behaviour. It confirms that canonical ordering is well-defined and that interleaved input maps to it — a necessary property — but it cannot, at this layer, invoke the agent (Layer 0/1 run offline, with no API key). Genuine verification of the agent's re-ordering behaviour belongs at Layer 3 (behavioural tests) if and when those are added for diagnostic-legibility. Until then, Layer 1 guards the definition and the agent prompt (Step 6) guards the behaviour.
What this slice does not do¶
- Surface the models to a human. A
/diagnosecommand is the deliverable of parent S4, #333. - Validate
LegibilityModelinstances at runtime. The schema spec (v0.2.0) explicitly deferred this; the agent enforces the contract through its prompt, not through a separate validator. - Iterate the cross-check loop until convergence. Single-pass per direction at v0.4.0; iteration can be added in a later slice if disposition data shows the single pass under-refines.
Further reading¶
- How to invoke the agent — task-oriented dispatch guide, including mode markers and the cross-check-only payload format.
- The challenge-refine protocol — Phase B (self-challenge), the prior context this slice builds on.
diagnostic-legibility/templates/legibility-element.md(in the repository) — the schema, now including thecross_check_statuswrapper field.- Sub-S3 design spec — the full design record this protocol descends from, including the diaboli + cartographer adjudication trail.