Draft · v0.2.0 · 2026-06-30
This document defines the validation rules that apply to IOA-ORM packages. Rules are organized by category and phase (publish-time vs runtime). Every rule uses normative language (MUST / SHOULD / MAY / MUST NOT) and includes a rule ID for tooling reference.
Validation is performed at two phases:
- Publish-time — when an AssessmentPackage is published from the authoring studio. A package that fails publish-time validation MUST NOT be published.
- Runtime — when a session is started from a published package. A package that fails runtime validation MUST NOT start a session.
Rules that apply to the top-level AssessmentPackage structure.
| ID | Rule | Phase | Severity |
|---|
| PKG-001 | Every package MUST have exactly one initialNodeId. | Publish | Error |
| PKG-002 | initialNodeId MUST refer to an existing node in the package’s nodes array. | Publish | Error |
| PKG-003 | initialNodeId MUST NOT refer to an end node. | Publish | Error |
| PKG-004 | Published packages MUST include irVersion conforming to the version format defined in 09-versioning.md. | Publish | Error |
| PKG-005 | Package nodes array MUST contain at least one node. | Publish | Error |
| PKG-006 | All nodeId values within a package MUST be unique. | Publish | Error |
| PKG-007 | Package MUST include a metadata object with at least packageId, title, and createdAt. | Publish | Error |
| PKG-008 | metadata.packageId MUST be a valid UUID or ULID. | Publish | Error |
| PKG-009 | Package SHOULD include metadata.author and metadata.version. | Publish | Warning |
| PKG-010 | Package MUST NOT contain more than 200 nodes. (System limit for runtime performance.) | Publish | Error |
| PKG-011 | Package MUST NOT reference external resources (URLs, file paths) that are not part of the package bundle unless explicitly declared as external dependencies. | Publish | Error |
| PKG-012 | Package MUST include metadata.structureLevel from the enum closed, semi-structured, open. This declares where the assessment sits on Joughin’s (1998) closed–open structure continuum. The value MUST be consistent with the followUpPolicy settings across all question nodes (e.g., open structure requires maxFollowUps > 0 on most nodes). Packages that set structureLevel inconsistently with their follow-up policies MUST include metadata.structureJustification explaining the rationale. | Publish | Error |
Rules that apply to individual nodes within a package.
| ID | Rule | Phase | Severity |
|---|
| NOD-001 | Every node MUST have a nodeId that is a non-empty string matching ^[a-zA-Z0-9_-]{1,128}$. | Publish | Error |
| NOD-002 | Every node MUST have a kind field from the allowed ExamRuntimeNodeKind enum (question, scenario, task, discussion, warmup, wrapup, branch, identity_check). See 02-schema.md §3. | Publish | Error |
| NOD-003 | Every node MUST have at least one entry in its transitions array (an always transition is valid as a fallback). | Publish | Error |
| NOD-005 | Every node MUST have a non-empty promptSeed string. See 02-schema.md §4. | Publish | Error |
| NOD-008 | Node promptSeed MUST NOT exceed 8,000 characters. (LLM context budget.) | Publish | Error |
| NOD-010 | timeBudgetMs MUST be a positive integer when specified. | Publish | Error |
| NOD-011 | timeBudgetMs SHOULD be between 30000 and 600000 for question nodes. | Publish | Warning |
| NOD-012 | Every node SHOULD have a candidateCommands policy with at least one allowed command. See 02-schema.md §13. | Publish | Warning |
| ID | Rule | Phase | Severity |
|---|
| NOD-Q001 | question nodes SHOULD have at least one evidenceTarget. | Publish | Warning |
| NOD-Q002 | Each evidenceTarget MUST have a unique id within the node. | Publish | Error |
| NOD-Q003 | Each evidenceTarget MUST have a non-empty label. | Publish | Error |
| NOD-Q004 | Each evidenceTarget SHOULD have a weight between 0 and 1. | Publish | Warning |
| NOD-Q005 | evidenceTarget weights within a node SHOULD sum to approximately 1.0 (tolerance ±0.05). | Publish | Warning |
| NOD-Q006 | question nodes SHOULD have a followUpPolicy object. See 02-schema.md §10. | Publish | Warning |
| NOD-Q007 | followUpPolicy.maxFollowUps MUST be >= 0. | Publish | Error |
| NOD-Q008 | followUpPolicy.maxFollowUps SHOULD be <= 10. | Publish | Warning |
| NOD-Q009 | followUpPolicy.maxFollowUpDurationSec MUST be > 0 when specified. | Publish | Error |
| NOD-Q010 | followUpPolicy.followUpStyle MUST be one of: probing, scaffolding, clarifying, redirecting, free when specified. See 02-schema.md §10. | Publish | Error |
| NOD-Q011 | question nodes SHOULD have candidateCommands including at least repeat, clarification, and pause. Packages missing any of these MUST include metadata.commandJustification explaining why that command is not applicable. (Grounded in Fenton 2025: accessibility and anxiety management require these minimum commands.) See 02-schema.md §13. | Publish | Warning |
| NOD-Q012 | When a package contains multiple question nodes, the followUpPolicy.followUpStyle SHOULD be consistent across all nodes. If styles differ across nodes, the package MUST include metadata.structureJustification explaining the pedagogical rationale for the variation. (Grounded in Joughin 1998: inconsistent structure threatens reliability.) | Publish | Warning |
| ID | Rule | Phase | Severity |
|---|
| NOD-E001 | end nodes MUST have an endType field from the allowed enum (normal, timeout, terminated, technical_failure). | Publish | Error |
| NOD-E002 | end nodes MUST have a prompt.closing message. | Publish | Error |
| NOD-E003 | end nodes MUST NOT have evidenceTargets. | Publish | Error |
| NOD-E004 | end nodes MUST NOT have followUpPolicy. | Publish | Error |
| NOD-E005 | end nodes MUST NOT have timeBudget. | Publish | Error |
| NOD-E006 | Every package MUST have at least one reachable end node. | Publish | Error |
| NOD-E007 | Packages SHOULD include end nodes covering the endType values normal, timeout, terminated, and technical_failure. Packages missing any of these MUST include metadata.endNodeRationale explaining why that endType is not applicable (e.g., “technical_failure handled by platform-level recovery, not specification”). | Publish | Warning |
Rules that apply to transitions between nodes.
| ID | Rule | Phase | Severity |
|---|
| TRN-001 | Every transition MUST have a targetNodeId that refers to an existing node in the package. | Publish | Error |
| TRN-002 | Every transition MUST have a condition object with a valid type. | Publish | Error |
| TRN-003 | condition.type MUST be one of: always, evidence_satisfied, turn_count_reached, time_elapsed, candidate_command, policy_escalation. See 02-schema.md §11 TransitionCondition. | Publish | Error |
| TRN-004 | evidence_satisfied conditions MUST specify targetIds as an array of valid evidence target IDs. | Publish | Error |
| TRN-005 | evidence_sufficient conditions MAY specify requiredEvidence as an array of evidence target IDs that MUST all exist on the source node. | Publish | Error (if present) |
| TRN-006 | A node MUST NOT have multiple always transitions. Use explicit conditions for disambiguation. | Publish | Error |
| TRN-007 | Transitions SHOULD NOT form cycles that would create infinite loops without a time_exhausted or follow_ups_exhausted escape. | Publish | Warning |
| TRN-008 | The graph MUST be traversable from initialNodeId to at least one end node following valid transitions. (Reachability check.) | Publish | Error |
| TRN-009 | No node SHOULD be unreachable from initialNodeId. (Orphan node check.) | Publish | Warning |
| TRN-010 | Transition conditions on the same source node MUST NOT have identical type and parameters (ambiguous routing). | Publish | Error |
| TRN-011 | condition objects MUST NOT reference evidence targets that do not exist on the source node. | Publish | Error |
Rules that apply to evidence targets and the evidence model.
| ID | Rule | Phase | Severity |
|---|
| EVD-001 | evidenceTarget.id MUST be unique within its node. | Publish | Error |
| EVD-002 | evidenceTarget.id SHOULD be globally unique across the package for marking-pipeline compatibility. | Publish | Warning |
| EVD-003 | evidenceTarget.label MUST be a non-empty human-readable string. | Publish | Error |
| EVD-004 | evidenceTarget.weight MUST be between 0.0 and 1.0 inclusive when specified. | Publish | Error |
| EVD-005 | The total evidence weight per node SHOULD sum to 1.0 (tolerance ±0.05). | Publish | Warning |
| EVD-006 | evidenceTarget MAY include a rubricDescriptor with levels (excellent, satisfactory, partial, absent). Each level MUST have a label and description. | Publish | Info |
| EVD-007 | evidenceTarget SHOULD include markingCriteria linking to assessment learning outcomes. | Publish | Warning |
| EVD-008 | Evidence signals signalKind at runtime MUST be one of the allowed enum values: positive, partial, absent, misconception, flawed_reasoning, process_positive, process_negative, self_correction. See 02-schema.md §7. | Runtime | Error |
| EVD-009 | Evidence ledger signal writes MUST include signalId, nodeId, sessionId, signalKind, confidence, timestampMs, and turnIds. See 02-schema.md §7. | Runtime | Error |
| EVD-010 | Evidence signals MUST NOT be recorded from transcript segments where STT confidence < 0.5. Such segments MUST trigger the stt_low_confidence recovery handler (if present) or be flagged for human review. The evidence ledger MUST include an sttConfidenceSummary (min, max, mean) for each signal, derived from the underlying transcript segments. (Grounded in Akimov & Malin 2020: intra-rater reliability requires confidence in source data.) | Runtime | Error |
Rules that apply to action policies, recovery handlers, and behavioral constraints.
| ID | Rule | Phase | Severity |
|---|
| POL-001 | forbiddenActions MUST NOT conflict with allowedActions on the same node. If an action appears in both, the node is invalid. | Publish | Error |
| POL-002 | candidateCommands.allowed command values MUST be from the CandidateCommandType enum: repeat, clarification, request_rephrase, pause, raise_hand, skip, volume_up, volume_down, language_switch, thinking_aloud. See 02-schema.md §13. | Publish | Error |
| POL-003 | candidateCommands.forbidden command values MUST be from the CandidateCommandType enum (same as POL-002). Each ForbiddenAction includes a reason and onViolation handler. See 02-schema.md §13. | Publish | Error |
| POL-004 | forbiddenActions on question nodes SHOULD include at least actions that would reveal the model answer or rubric scoring logic. | Publish | Warning |
| POL-005 | forbiddenActions from GlobalRuntimePolicies MUST be propagated into the Pipecat adapter’s task_messages. The adapter MUST NOT omit forbidden actions. | Publish | Error |
| POL-006 | evidenceSignal.description (when present in the specification) MUST contain behavioral observations (what the candidate said or did), NOT verbatim rubric level descriptors (e.g., MUST NOT contain “Excellent: …”, “Satisfactory: …”, “Grade A: …”). This prevents rubric leakage through the LLM’s spoken output. (Grounded in Akimov & Malin 2020: examiner bias mitigation; Fenton 2025: prompting neutrality.) | Publish | Error |
| POL-007 | The task_messages generated by the adapter MUST include a prompting consistency directive instructing the LLM to use the same questioning approach for all candidates and not to vary scaffolding level based on perceived candidate ability. (Grounded in Pearce & Chiavaroli 2020, via Fenton 2025: consistency is a guiding principle for prompting.) | Publish | Error |
| POL-008 | When anxietyDetected is true in a report_observation, the Runtime Controller’s response MUST be limited to calm_support or pause_timer recovery actions. The system MUST NOT provide assessment-relevant reassurance (e.g., “You’re doing great”, “That’s a good answer”) that could affect assessment validity. Neutral procedural support (e.g., “Take your time”, “Would you like me to repeat the question?”) is permitted. (Grounded in Fenton 2025, citing Pearce & Chiavaroli 2020: prompting must neither discourage nor reassure.) | Publish | Error |
| ID | Rule | Phase | Severity |
|---|
| POL-F001 | maxFollowUps MUST be >= 0. | Publish | Error |
| POL-F002 | maxFollowUps of 0 means the examiner asks the opening question and MUST transition without follow-up. | Publish | Info |
| POL-F003 | maxFollowUpDurationSec MUST be > 0 when maxFollowUps > 0. | Publish | Error |
| POL-F004 | maxFollowUpDurationSec SHOULD be <= timeBudget.maxDurationSec when both are specified. | Publish | Warning |
| ID | Rule | Phase | Severity |
|---|
| POL-R001 | Recovery handler scenario values MUST be from the RecoveryScenario enum: silence, unclear_answer, off_topic, anxiety, interruption, network_issue, repetition_loop. See 02-schema.md §12. | Publish | Error |
| POL-R002 | Recovery handler escalation values MUST be from the RecoveryEscalation enum: retry, rephrase, skip_node, pause_session, terminate. See 02-schema.md §12. | Publish | Error |
| POL-R003 | silence recovery with maxAttempts exhausted MUST use skip_node, pause_session, or terminate escalation. Long silence should not loop indefinitely. | Publish | Error |
| POL-R004 | Recovery handlers MUST NOT modify the node’s evidenceTargets or transitions. Recovery is behavioral, not structural. | Publish | Error |
| POL-R005 | Recovery handlers SHOULD include stt_low_confidence (triggered when STT transcript confidence < 0.6) with action gentle_reprompt or technical_recovery. Packages without this handler MUST include metadata.sttHandlingJustification. (Grounded in Fenton 2025: practical considerations for online oral assessments include technology reliability.) | Publish | Warning |
Rules that apply to the event schema and emission.
| ID | Rule | Phase | Severity |
|---|
| EVT-001 | Every emitted event MUST include event (type), sessionId, and timestamp. | Runtime | Error |
| EVT-002 | timestamp MUST be in ISO 8601 format with timezone. | Runtime | Error |
| EVT-003 | Events MUST be emitted in causal order per session. node_entered MUST precede evidence_update for the same nodeId. | Runtime | Error |
| EVT-004 | node_entered MUST include nodeId, nodeKind (from ExamRuntimeNodeKind), and timeBudgetMs. See 02-schema.md §14 NodeEventPayload. | Runtime | Error |
| EVT-005 | evidence_signal_emitted MUST include signalId, nodeId, and signalKind. See 02-schema.md §14 EvidenceEventPayload. | Runtime | Error |
| EVT-006 | node_exited MUST include nodeId and reason. See 02-schema.md §14 NodeEventPayload. | Runtime | Error |
| EVT-007 | candidate_command_received MUST include command matching a CandidateCommandType value. See 02-schema.md §14 CommandEventPayload. | Runtime | Error |
| EVT-008 | transcript_final events MUST include speaker, text, turnId, nodeId, and confidence. See 02-schema.md §15 TranscriptTurn. | Runtime | Error |
| EVT-009 | session_completed MUST include reason, totalTurns, and totalElapsedMs. See 02-schema.md §14 SessionLifecyclePayload. | Runtime | Error |
| EVT-010 | Events MUST be persisted to the Event Store before being considered committed. Data channel emission is best-effort; persistence is authoritative. | Runtime | Error |
| EVT-011 | Event schema version MUST be included as schemaVersion in the event envelope. (See 09-versioning.md.) | Runtime | Error |
Rules that apply to the Pipecat adapter output (FlowConfig).
| ID | Rule | Phase | Severity |
|---|
| ADP-001 | Every specification node MUST produce exactly one NodeConfig in the adapter output. | Publish | Error |
| ADP-002 | NodeConfig.id MUST equal the specification nodeId. | Publish | Error |
| ADP-003 | The report_observation function MUST be registered on every non-end node. This is the single function the LLM calls to report all observations (evidence signals, candidate commands, answer quality, follow-up intent, and proposed speech). No other domain-level functions are permitted. | Publish | Error |
| ADP-004 | The report_observation schema MUST include a signals array with at least the fields: signalType (string), excerpt (string), and confidence (number 0.0–1.0). The signals array MAY be empty when no evidence is observed (e.g., candidate issued a command). | Publish | Error |
| ADP-005 | The report_observation schema MUST include a commandDetected field with enum values covering at least: repeat, clarify, rephrase, slow_down, pause, thinking_time, help, skip, revise, finish. See 04-agent-boundary.md §3.3. | Publish | Error |
| ADP-006 | forbiddenActions MUST appear in the generated task_messages developer message in a recognizable “Do NOT” block. The adapter MUST NOT omit forbidden actions from the LLM’s instructions. | Publish | Error |
| ADP-007 | allowedActions MUST appear in the generated task_messages developer message in a recognizable “You may” block. | Publish | Error |
| ADP-008 | metadata.maxFollowUps MUST be present and equal to the specification value. | Publish | Error |
| ADP-009 | metadata.timeBudgetSec MUST be present and equal to the specification value when timeBudget is specified. | Publish | Error |
| ADP-010 | metadata.evidenceTargets MUST list all evidence target IDs from the specification node. | Publish | Error |
| ADP-011 | metadata.irNodeId MUST be present and equal to the specification nodeId. | Publish | Error |
| ADP-012 | edges MUST reflect all valid transitions from the specification, with guard: "runtime_controller_approval". | Publish | Error |
| ADP-013 | transcript hooks MUST be configured to forward to the Runtime Controller. | Publish | Error |
| ADP-014 | dataChannel.topic MUST be specified. | Publish | Error |
| ADP-015 | The adapter MUST NOT produce NodeConfig entries for specification constructs that have no Pipecat equivalent without encoding them as metadata + tool-calls. Silent dropping is prohibited. | Publish | Error |
| ADP-016 | The adapter MUST include an outputValidationFilters configuration in the compiled FlowConfig specifying which output filters are active. At minimum, the following filters MUST be configured: persona_break, rubric_leak, topic_containment, length. | Publish | Error |
Rules that ensure packages remain valid across specification version changes.
| ID | Rule | Phase | Severity |
|---|
| CMP-001 | A published package’s irVersion is immutable. It MUST NOT be changed after publication. | Publish | Error |
| CMP-002 | A runtime MUST support the irVersion declared in the package. If unsupported, the session MUST NOT start. | Runtime | Error |
| CMP-003 | The runtime SHOULD support the current and the two most recent minor versions within the same major version. | Runtime | Warning |
| CMP-004 | Major version changes (e.g., 0.x → 1.0) MAY break backward compatibility. Packages MUST be re-published under the new version. | Publish | Info |
| CMP-005 | Minor version changes MUST be backward-compatible within the same major version. A package compiled for exam-runtime-ir/0.1 MUST be executable by a runtime supporting exam-runtime-ir/0.3. | Runtime | Error |
| CMP-006 | Deprecated fields MUST still be accepted by the runtime for at least one minor version after deprecation. | Runtime | Warning |
| CMP-007 | Deprecated fields MUST NOT be removed until a major version bump. | Publish | Error |
| CMP-008 | The runtime MUST log a deprecation warning when processing deprecated fields. | Runtime | Warning |
| CMP-009 | Packages compiled with an irVersion that has reached end-of-life MUST be rejected with a clear error message indicating the supported versions. | Runtime | Error |
| CMP-010 | Adapter version (adapterVersion) MUST be recorded alongside the compiled FlowConfig for debugging and audit purposes. | Publish | Error |
Rules that ensure the assessment is fair across candidates. Grounded in Akimov & Malin (2020) validity/reliability/fairness matrix and Joughin (1998) structure dimension.
| ID | Rule | Phase | Severity |
|---|
| FAIR-001 | When a package contains multiple question nodes intended for the same assessment session, the sum of evidenceTarget weights across all question nodes SHOULD be approximately equal (tolerance ±0.15). Packages that deviate MUST include metadata.difficultyJustification explaining why the weight imbalance is pedagogically appropriate. (Grounded in Akimov & Malin 2020: inter-case reliability requires comparable assessment difficulty across candidates.) | Publish | Warning |
| FAIR-002 | timeBudget.maxDurationSec SHOULD be proportional to the cognitive demand of the question, not merely its word count. Packages where time budgets vary by more than 100% across question nodes MUST include metadata.timeBudgetJustification. Literature suggests 5–7 minutes for theoretical questions and 20–30 minutes total for a complete assessment (Akimov & Malin 2020; Fenton 2025, citing Sayre 2014). | Publish | Warning |
| FAIR-003 | When a package includes multiple question nodes that are candidates for the same assessment slot (randomized selection from a question pool), the package MUST include a difficultyCalibration object documenting that questions are of equivalent difficulty. This addresses inter-case reliability (Akimov & Malin 2020, Table 4). | Publish | Error |
| FAIR-004 | When metadata.expectedCandidateCount exceeds 50, and the package uses randomized question selection, the package MUST have at least expectedCandidateCount / 10 distinct question variants per assessment slot to mitigate question-sharing risk. (Grounded in Bayley et al. 2024: students shared ConVOE questions via group chat.) | Publish | Warning |
Rules that define what the output validation pipeline (content/topic/action/length filters) MUST check. These filters intercept the LLM’s proposed spokenText before it reaches the candidate via TTS.
| ID | Rule | Phase | Severity |
|---|
| OUT-001 | Output validation MUST include a persona_break filter that checks whether spokenText contains phrases indicating the LLM has broken character (e.g., “As your examiner…”, “According to the rubric…”, “I’m an AI…”). Matches MUST be intercepted and replaced with a neutral alternative. | Runtime | Error |
| OUT-002 | Output validation MUST include a rubric_leak filter that checks spokenText against all evidenceSignal.description text and rubricDescriptor text in the current node. If spokenText contains verbatim or near-verbatim matches of rubric content, the output MUST be intercepted. (Grounded in Akimov & Malin 2020: examiner bias mitigation requires rubric confidentiality.) | Runtime | Error |
| OUT-003 | Output validation MUST include a topic_containment filter that ensures spokenText stays within the current node’s scenario domain. If the LLM’s proposed speech introduces topics unrelated to the current assessment node, the output MUST be intercepted or redirected. | Runtime | Error |
| OUT-004 | Output validation MUST include a length filter. spokenText for TTS MUST NOT exceed 500 characters per utterance to maintain conversational pacing. Longer utterances MUST be split or summarized. (Practical constraint: TTS quality degrades on long passages; conversational turn-taking requires brevity.) | Runtime | Error |
| OUT-005 | Output validation SHOULD include a leading_question filter that checks whether spokenText contains phrasing that suggests the correct answer (e.g., “Wouldn’t you say that…”, “Don’t you think…”, “Surely you’d agree…”). This filter operates as a guardrail supplement to forbiddenActions: ["reveal_answer"]. | Runtime | Warning |
AssessmentPackage
│
▼
1. Schema validation (JSON Schema conformance)
│
▼
2. Package rules (PKG-*) — includes structure level (PKG-012)
│
▼
3. Node rules (NOD-*) — includes follow-up consistency (NOD-Q012)
│
▼
4. Transition rules (TRN-*)
│
▼
5. Evidence rules (EVD-*)
│
▼
6. Policy rules (POL-*) — includes rubric-leak, prompting, anxiety rules
│
▼
7. Fairness rules (FAIR-*) — difficulty, time, question pool equity
│
▼
8. Graph reachability (TRN-008, TRN-009)
│
▼
9. Adapter compilation + adapter rules (ADP-*) — includes output filters (ADP-016)
│
▼
10. Backward compatibility check (CMP-*)
│
▼
Publication gate: all errors resolved → PUBLISH
any errors remain → REJECT with report
Runtime validation occurs when a session is initiated from a published package:
- Verify
irVersion is supported by the runtime (CMP-002).
- Verify package integrity (checksum match against published artifact).
- Verify all node references are resolvable (redundant safety check).
- Verify event protocol version is compatible.
- Initialize Runtime Controller with validated package.
{
"packageId": "pkg-2026-0506-001",
"irVersion": "exam-runtime-ir/0.1",
"validatedAt": "2026-05-06T02:00:00Z",
"result": "reject", // "pass" | "reject"
"errors": [
{
"ruleId": "NOD-Q007",
"severity": "error",
"nodeId": "q-osce-station-1",
"message": "followUpPolicy.maxFollowUps must be >= 0, got -1",
"path": "nodes[q-osce-station-1].followUpPolicy.maxFollowUps"
}
],
"warnings": [
{
"ruleId": "NOD-Q001",
"severity": "warning",
"nodeId": "q-osce-station-3",
"message": "Question node has no evidenceTargets. Consider adding at least one.",
"path": "nodes[q-osce-station-3].evidenceTargets"
}
],
"summary": {
"errors": 1,
"warnings": 3,
"nodesValidated": 8,
"transitionsValidated": 12
}
}
| Version | Date | Changes |
|---|
| v0.2.0 | 2026-06-30 | Added validation rules for new schema fields (BloomLevel, anxietyMitigation, CandidateBriefing, etc.). Updated terminology from ‘Exam Runtime IR’ to ‘IOA-ORM’. |
| v0.1.0 | 2026-05-06 | Initial release. |