Skip to content

Validation Rules

Draft · v0.2.0 · 2026-06-30

This document defines the validation rules that apply to IOA-ORM packages. Rules are organized by category and phase (publish-time vs runtime). Every rule uses normative language (MUST / SHOULD / MAY / MUST NOT) and includes a rule ID for tooling reference.

Validation is performed at two phases:

  • Publish-time — when an AssessmentPackage is published from the authoring studio. A package that fails publish-time validation MUST NOT be published.
  • Runtime — when a session is started from a published package. A package that fails runtime validation MUST NOT start a session.

Rules that apply to the top-level AssessmentPackage structure.

IDRulePhaseSeverity
PKG-001Every package MUST have exactly one initialNodeId.PublishError
PKG-002initialNodeId MUST refer to an existing node in the package’s nodes array.PublishError
PKG-003initialNodeId MUST NOT refer to an end node.PublishError
PKG-004Published packages MUST include irVersion conforming to the version format defined in 09-versioning.md.PublishError
PKG-005Package nodes array MUST contain at least one node.PublishError
PKG-006All nodeId values within a package MUST be unique.PublishError
PKG-007Package MUST include a metadata object with at least packageId, title, and createdAt.PublishError
PKG-008metadata.packageId MUST be a valid UUID or ULID.PublishError
PKG-009Package SHOULD include metadata.author and metadata.version.PublishWarning
PKG-010Package MUST NOT contain more than 200 nodes. (System limit for runtime performance.)PublishError
PKG-011Package MUST NOT reference external resources (URLs, file paths) that are not part of the package bundle unless explicitly declared as external dependencies.PublishError
PKG-012Package MUST include metadata.structureLevel from the enum closed, semi-structured, open. This declares where the assessment sits on Joughin’s (1998) closed–open structure continuum. The value MUST be consistent with the followUpPolicy settings across all question nodes (e.g., open structure requires maxFollowUps > 0 on most nodes). Packages that set structureLevel inconsistently with their follow-up policies MUST include metadata.structureJustification explaining the rationale.PublishError

Rules that apply to individual nodes within a package.

IDRulePhaseSeverity
NOD-001Every node MUST have a nodeId that is a non-empty string matching ^[a-zA-Z0-9_-]{1,128}$.PublishError
NOD-002Every node MUST have a kind field from the allowed ExamRuntimeNodeKind enum (question, scenario, task, discussion, warmup, wrapup, branch, identity_check). See 02-schema.md §3.PublishError
NOD-003Every node MUST have at least one entry in its transitions array (an always transition is valid as a fallback).PublishError
NOD-005Every node MUST have a non-empty promptSeed string. See 02-schema.md §4.PublishError
NOD-008Node promptSeed MUST NOT exceed 8,000 characters. (LLM context budget.)PublishError
NOD-010timeBudgetMs MUST be a positive integer when specified.PublishError
NOD-011timeBudgetMs SHOULD be between 30000 and 600000 for question nodes.PublishWarning
NOD-012Every node SHOULD have a candidateCommands policy with at least one allowed command. See 02-schema.md §13.PublishWarning
IDRulePhaseSeverity
NOD-Q001question nodes SHOULD have at least one evidenceTarget.PublishWarning
NOD-Q002Each evidenceTarget MUST have a unique id within the node.PublishError
NOD-Q003Each evidenceTarget MUST have a non-empty label.PublishError
NOD-Q004Each evidenceTarget SHOULD have a weight between 0 and 1.PublishWarning
NOD-Q005evidenceTarget weights within a node SHOULD sum to approximately 1.0 (tolerance ±0.05).PublishWarning
NOD-Q006question nodes SHOULD have a followUpPolicy object. See 02-schema.md §10.PublishWarning
NOD-Q007followUpPolicy.maxFollowUps MUST be >= 0.PublishError
NOD-Q008followUpPolicy.maxFollowUps SHOULD be <= 10.PublishWarning
NOD-Q009followUpPolicy.maxFollowUpDurationSec MUST be > 0 when specified.PublishError
NOD-Q010followUpPolicy.followUpStyle MUST be one of: probing, scaffolding, clarifying, redirecting, free when specified. See 02-schema.md §10.PublishError
NOD-Q011question nodes SHOULD have candidateCommands including at least repeat, clarification, and pause. Packages missing any of these MUST include metadata.commandJustification explaining why that command is not applicable. (Grounded in Fenton 2025: accessibility and anxiety management require these minimum commands.) See 02-schema.md §13.PublishWarning
NOD-Q012When a package contains multiple question nodes, the followUpPolicy.followUpStyle SHOULD be consistent across all nodes. If styles differ across nodes, the package MUST include metadata.structureJustification explaining the pedagogical rationale for the variation. (Grounded in Joughin 1998: inconsistent structure threatens reliability.)PublishWarning
IDRulePhaseSeverity
NOD-E001end nodes MUST have an endType field from the allowed enum (normal, timeout, terminated, technical_failure).PublishError
NOD-E002end nodes MUST have a prompt.closing message.PublishError
NOD-E003end nodes MUST NOT have evidenceTargets.PublishError
NOD-E004end nodes MUST NOT have followUpPolicy.PublishError
NOD-E005end nodes MUST NOT have timeBudget.PublishError
NOD-E006Every package MUST have at least one reachable end node.PublishError
NOD-E007Packages SHOULD include end nodes covering the endType values normal, timeout, terminated, and technical_failure. Packages missing any of these MUST include metadata.endNodeRationale explaining why that endType is not applicable (e.g., “technical_failure handled by platform-level recovery, not specification”).PublishWarning

Rules that apply to transitions between nodes.

IDRulePhaseSeverity
TRN-001Every transition MUST have a targetNodeId that refers to an existing node in the package.PublishError
TRN-002Every transition MUST have a condition object with a valid type.PublishError
TRN-003condition.type MUST be one of: always, evidence_satisfied, turn_count_reached, time_elapsed, candidate_command, policy_escalation. See 02-schema.md §11 TransitionCondition.PublishError
TRN-004evidence_satisfied conditions MUST specify targetIds as an array of valid evidence target IDs.PublishError
TRN-005evidence_sufficient conditions MAY specify requiredEvidence as an array of evidence target IDs that MUST all exist on the source node.PublishError (if present)
TRN-006A node MUST NOT have multiple always transitions. Use explicit conditions for disambiguation.PublishError
TRN-007Transitions SHOULD NOT form cycles that would create infinite loops without a time_exhausted or follow_ups_exhausted escape.PublishWarning
TRN-008The graph MUST be traversable from initialNodeId to at least one end node following valid transitions. (Reachability check.)PublishError
TRN-009No node SHOULD be unreachable from initialNodeId. (Orphan node check.)PublishWarning
TRN-010Transition conditions on the same source node MUST NOT have identical type and parameters (ambiguous routing).PublishError
TRN-011condition objects MUST NOT reference evidence targets that do not exist on the source node.PublishError

Rules that apply to evidence targets and the evidence model.

IDRulePhaseSeverity
EVD-001evidenceTarget.id MUST be unique within its node.PublishError
EVD-002evidenceTarget.id SHOULD be globally unique across the package for marking-pipeline compatibility.PublishWarning
EVD-003evidenceTarget.label MUST be a non-empty human-readable string.PublishError
EVD-004evidenceTarget.weight MUST be between 0.0 and 1.0 inclusive when specified.PublishError
EVD-005The total evidence weight per node SHOULD sum to 1.0 (tolerance ±0.05).PublishWarning
EVD-006evidenceTarget MAY include a rubricDescriptor with levels (excellent, satisfactory, partial, absent). Each level MUST have a label and description.PublishInfo
EVD-007evidenceTarget SHOULD include markingCriteria linking to assessment learning outcomes.PublishWarning
EVD-008Evidence signals signalKind at runtime MUST be one of the allowed enum values: positive, partial, absent, misconception, flawed_reasoning, process_positive, process_negative, self_correction. See 02-schema.md §7.RuntimeError
EVD-009Evidence ledger signal writes MUST include signalId, nodeId, sessionId, signalKind, confidence, timestampMs, and turnIds. See 02-schema.md §7.RuntimeError
EVD-010Evidence signals MUST NOT be recorded from transcript segments where STT confidence < 0.5. Such segments MUST trigger the stt_low_confidence recovery handler (if present) or be flagged for human review. The evidence ledger MUST include an sttConfidenceSummary (min, max, mean) for each signal, derived from the underlying transcript segments. (Grounded in Akimov & Malin 2020: intra-rater reliability requires confidence in source data.)RuntimeError

Rules that apply to action policies, recovery handlers, and behavioral constraints.

IDRulePhaseSeverity
POL-001forbiddenActions MUST NOT conflict with allowedActions on the same node. If an action appears in both, the node is invalid.PublishError
POL-002candidateCommands.allowed command values MUST be from the CandidateCommandType enum: repeat, clarification, request_rephrase, pause, raise_hand, skip, volume_up, volume_down, language_switch, thinking_aloud. See 02-schema.md §13.PublishError
POL-003candidateCommands.forbidden command values MUST be from the CandidateCommandType enum (same as POL-002). Each ForbiddenAction includes a reason and onViolation handler. See 02-schema.md §13.PublishError
POL-004forbiddenActions on question nodes SHOULD include at least actions that would reveal the model answer or rubric scoring logic.PublishWarning
POL-005forbiddenActions from GlobalRuntimePolicies MUST be propagated into the Pipecat adapter’s task_messages. The adapter MUST NOT omit forbidden actions.PublishError
POL-006evidenceSignal.description (when present in the specification) MUST contain behavioral observations (what the candidate said or did), NOT verbatim rubric level descriptors (e.g., MUST NOT contain “Excellent: …”, “Satisfactory: …”, “Grade A: …”). This prevents rubric leakage through the LLM’s spoken output. (Grounded in Akimov & Malin 2020: examiner bias mitigation; Fenton 2025: prompting neutrality.)PublishError
POL-007The task_messages generated by the adapter MUST include a prompting consistency directive instructing the LLM to use the same questioning approach for all candidates and not to vary scaffolding level based on perceived candidate ability. (Grounded in Pearce & Chiavaroli 2020, via Fenton 2025: consistency is a guiding principle for prompting.)PublishError
POL-008When anxietyDetected is true in a report_observation, the Runtime Controller’s response MUST be limited to calm_support or pause_timer recovery actions. The system MUST NOT provide assessment-relevant reassurance (e.g., “You’re doing great”, “That’s a good answer”) that could affect assessment validity. Neutral procedural support (e.g., “Take your time”, “Would you like me to repeat the question?”) is permitted. (Grounded in Fenton 2025, citing Pearce & Chiavaroli 2020: prompting must neither discourage nor reassure.)PublishError
IDRulePhaseSeverity
POL-F001maxFollowUps MUST be >= 0.PublishError
POL-F002maxFollowUps of 0 means the examiner asks the opening question and MUST transition without follow-up.PublishInfo
POL-F003maxFollowUpDurationSec MUST be > 0 when maxFollowUps > 0.PublishError
POL-F004maxFollowUpDurationSec SHOULD be <= timeBudget.maxDurationSec when both are specified.PublishWarning
IDRulePhaseSeverity
POL-R001Recovery handler scenario values MUST be from the RecoveryScenario enum: silence, unclear_answer, off_topic, anxiety, interruption, network_issue, repetition_loop. See 02-schema.md §12.PublishError
POL-R002Recovery handler escalation values MUST be from the RecoveryEscalation enum: retry, rephrase, skip_node, pause_session, terminate. See 02-schema.md §12.PublishError
POL-R003silence recovery with maxAttempts exhausted MUST use skip_node, pause_session, or terminate escalation. Long silence should not loop indefinitely.PublishError
POL-R004Recovery handlers MUST NOT modify the node’s evidenceTargets or transitions. Recovery is behavioral, not structural.PublishError
POL-R005Recovery handlers SHOULD include stt_low_confidence (triggered when STT transcript confidence < 0.6) with action gentle_reprompt or technical_recovery. Packages without this handler MUST include metadata.sttHandlingJustification. (Grounded in Fenton 2025: practical considerations for online oral assessments include technology reliability.)PublishWarning

Rules that apply to the event schema and emission.

IDRulePhaseSeverity
EVT-001Every emitted event MUST include event (type), sessionId, and timestamp.RuntimeError
EVT-002timestamp MUST be in ISO 8601 format with timezone.RuntimeError
EVT-003Events MUST be emitted in causal order per session. node_entered MUST precede evidence_update for the same nodeId.RuntimeError
EVT-004node_entered MUST include nodeId, nodeKind (from ExamRuntimeNodeKind), and timeBudgetMs. See 02-schema.md §14 NodeEventPayload.RuntimeError
EVT-005evidence_signal_emitted MUST include signalId, nodeId, and signalKind. See 02-schema.md §14 EvidenceEventPayload.RuntimeError
EVT-006node_exited MUST include nodeId and reason. See 02-schema.md §14 NodeEventPayload.RuntimeError
EVT-007candidate_command_received MUST include command matching a CandidateCommandType value. See 02-schema.md §14 CommandEventPayload.RuntimeError
EVT-008transcript_final events MUST include speaker, text, turnId, nodeId, and confidence. See 02-schema.md §15 TranscriptTurn.RuntimeError
EVT-009session_completed MUST include reason, totalTurns, and totalElapsedMs. See 02-schema.md §14 SessionLifecyclePayload.RuntimeError
EVT-010Events MUST be persisted to the Event Store before being considered committed. Data channel emission is best-effort; persistence is authoritative.RuntimeError
EVT-011Event schema version MUST be included as schemaVersion in the event envelope. (See 09-versioning.md.)RuntimeError

Rules that apply to the Pipecat adapter output (FlowConfig).

IDRulePhaseSeverity
ADP-001Every specification node MUST produce exactly one NodeConfig in the adapter output.PublishError
ADP-002NodeConfig.id MUST equal the specification nodeId.PublishError
ADP-003The report_observation function MUST be registered on every non-end node. This is the single function the LLM calls to report all observations (evidence signals, candidate commands, answer quality, follow-up intent, and proposed speech). No other domain-level functions are permitted.PublishError
ADP-004The report_observation schema MUST include a signals array with at least the fields: signalType (string), excerpt (string), and confidence (number 0.0–1.0). The signals array MAY be empty when no evidence is observed (e.g., candidate issued a command).PublishError
ADP-005The report_observation schema MUST include a commandDetected field with enum values covering at least: repeat, clarify, rephrase, slow_down, pause, thinking_time, help, skip, revise, finish. See 04-agent-boundary.md §3.3.PublishError
ADP-006forbiddenActions MUST appear in the generated task_messages developer message in a recognizable “Do NOT” block. The adapter MUST NOT omit forbidden actions from the LLM’s instructions.PublishError
ADP-007allowedActions MUST appear in the generated task_messages developer message in a recognizable “You may” block.PublishError
ADP-008metadata.maxFollowUps MUST be present and equal to the specification value.PublishError
ADP-009metadata.timeBudgetSec MUST be present and equal to the specification value when timeBudget is specified.PublishError
ADP-010metadata.evidenceTargets MUST list all evidence target IDs from the specification node.PublishError
ADP-011metadata.irNodeId MUST be present and equal to the specification nodeId.PublishError
ADP-012edges MUST reflect all valid transitions from the specification, with guard: "runtime_controller_approval".PublishError
ADP-013transcript hooks MUST be configured to forward to the Runtime Controller.PublishError
ADP-014dataChannel.topic MUST be specified.PublishError
ADP-015The adapter MUST NOT produce NodeConfig entries for specification constructs that have no Pipecat equivalent without encoding them as metadata + tool-calls. Silent dropping is prohibited.PublishError
ADP-016The adapter MUST include an outputValidationFilters configuration in the compiled FlowConfig specifying which output filters are active. At minimum, the following filters MUST be configured: persona_break, rubric_leak, topic_containment, length.PublishError

Rules that ensure packages remain valid across specification version changes.

IDRulePhaseSeverity
CMP-001A published package’s irVersion is immutable. It MUST NOT be changed after publication.PublishError
CMP-002A runtime MUST support the irVersion declared in the package. If unsupported, the session MUST NOT start.RuntimeError
CMP-003The runtime SHOULD support the current and the two most recent minor versions within the same major version.RuntimeWarning
CMP-004Major version changes (e.g., 0.x1.0) MAY break backward compatibility. Packages MUST be re-published under the new version.PublishInfo
CMP-005Minor version changes MUST be backward-compatible within the same major version. A package compiled for exam-runtime-ir/0.1 MUST be executable by a runtime supporting exam-runtime-ir/0.3.RuntimeError
CMP-006Deprecated fields MUST still be accepted by the runtime for at least one minor version after deprecation.RuntimeWarning
CMP-007Deprecated fields MUST NOT be removed until a major version bump.PublishError
CMP-008The runtime MUST log a deprecation warning when processing deprecated fields.RuntimeWarning
CMP-009Packages compiled with an irVersion that has reached end-of-life MUST be rejected with a clear error message indicating the supported versions.RuntimeError
CMP-010Adapter version (adapterVersion) MUST be recorded alongside the compiled FlowConfig for debugging and audit purposes.PublishError

Rules that ensure the assessment is fair across candidates. Grounded in Akimov & Malin (2020) validity/reliability/fairness matrix and Joughin (1998) structure dimension.

IDRulePhaseSeverity
FAIR-001When a package contains multiple question nodes intended for the same assessment session, the sum of evidenceTarget weights across all question nodes SHOULD be approximately equal (tolerance ±0.15). Packages that deviate MUST include metadata.difficultyJustification explaining why the weight imbalance is pedagogically appropriate. (Grounded in Akimov & Malin 2020: inter-case reliability requires comparable assessment difficulty across candidates.)PublishWarning
FAIR-002timeBudget.maxDurationSec SHOULD be proportional to the cognitive demand of the question, not merely its word count. Packages where time budgets vary by more than 100% across question nodes MUST include metadata.timeBudgetJustification. Literature suggests 5–7 minutes for theoretical questions and 20–30 minutes total for a complete assessment (Akimov & Malin 2020; Fenton 2025, citing Sayre 2014).PublishWarning
FAIR-003When a package includes multiple question nodes that are candidates for the same assessment slot (randomized selection from a question pool), the package MUST include a difficultyCalibration object documenting that questions are of equivalent difficulty. This addresses inter-case reliability (Akimov & Malin 2020, Table 4).PublishError
FAIR-004When metadata.expectedCandidateCount exceeds 50, and the package uses randomized question selection, the package MUST have at least expectedCandidateCount / 10 distinct question variants per assessment slot to mitigate question-sharing risk. (Grounded in Bayley et al. 2024: students shared ConVOE questions via group chat.)PublishWarning

Rules that define what the output validation pipeline (content/topic/action/length filters) MUST check. These filters intercept the LLM’s proposed spokenText before it reaches the candidate via TTS.

IDRulePhaseSeverity
OUT-001Output validation MUST include a persona_break filter that checks whether spokenText contains phrases indicating the LLM has broken character (e.g., “As your examiner…”, “According to the rubric…”, “I’m an AI…”). Matches MUST be intercepted and replaced with a neutral alternative.RuntimeError
OUT-002Output validation MUST include a rubric_leak filter that checks spokenText against all evidenceSignal.description text and rubricDescriptor text in the current node. If spokenText contains verbatim or near-verbatim matches of rubric content, the output MUST be intercepted. (Grounded in Akimov & Malin 2020: examiner bias mitigation requires rubric confidentiality.)RuntimeError
OUT-003Output validation MUST include a topic_containment filter that ensures spokenText stays within the current node’s scenario domain. If the LLM’s proposed speech introduces topics unrelated to the current assessment node, the output MUST be intercepted or redirected.RuntimeError
OUT-004Output validation MUST include a length filter. spokenText for TTS MUST NOT exceed 500 characters per utterance to maintain conversational pacing. Longer utterances MUST be split or summarized. (Practical constraint: TTS quality degrades on long passages; conversational turn-taking requires brevity.)RuntimeError
OUT-005Output validation SHOULD include a leading_question filter that checks whether spokenText contains phrasing that suggests the correct answer (e.g., “Wouldn’t you say that…”, “Don’t you think…”, “Surely you’d agree…”). This filter operates as a guardrail supplement to forbiddenActions: ["reveal_answer"].RuntimeWarning

AssessmentPackage


  1. Schema validation (JSON Schema conformance)


  2. Package rules (PKG-*) — includes structure level (PKG-012)


  3. Node rules (NOD-*) — includes follow-up consistency (NOD-Q012)


  4. Transition rules (TRN-*)


  5. Evidence rules (EVD-*)


  6. Policy rules (POL-*) — includes rubric-leak, prompting, anxiety rules


  7. Fairness rules (FAIR-*) — difficulty, time, question pool equity


  8. Graph reachability (TRN-008, TRN-009)


  9. Adapter compilation + adapter rules (ADP-*) — includes output filters (ADP-016)


  10. Backward compatibility check (CMP-*)


  Publication gate: all errors resolved → PUBLISH
                   any errors remain → REJECT with report

Runtime validation occurs when a session is initiated from a published package:

  1. Verify irVersion is supported by the runtime (CMP-002).
  2. Verify package integrity (checksum match against published artifact).
  3. Verify all node references are resolvable (redundant safety check).
  4. Verify event protocol version is compatible.
  5. Initialize Runtime Controller with validated package.
{
  "packageId": "pkg-2026-0506-001",
  "irVersion": "exam-runtime-ir/0.1",
  "validatedAt": "2026-05-06T02:00:00Z",
  "result": "reject",      // "pass" | "reject"
  "errors": [
    {
      "ruleId": "NOD-Q007",
      "severity": "error",
      "nodeId": "q-osce-station-1",
      "message": "followUpPolicy.maxFollowUps must be >= 0, got -1",
      "path": "nodes[q-osce-station-1].followUpPolicy.maxFollowUps"
    }
  ],
  "warnings": [
    {
      "ruleId": "NOD-Q001",
      "severity": "warning",
      "nodeId": "q-osce-station-3",
      "message": "Question node has no evidenceTargets. Consider adding at least one.",
      "path": "nodes[q-osce-station-3].evidenceTargets"
    }
  ],
  "summary": {
    "errors": 1,
    "warnings": 3,
    "nodesValidated": 8,
    "transitionsValidated": 12
  }
}
VersionDateChanges
v0.2.02026-06-30Added validation rules for new schema fields (BloomLevel, anxietyMitigation, CandidateBriefing, etc.). Updated terminology from ‘Exam Runtime IR’ to ‘IOA-ORM’.
v0.1.02026-05-06Initial release.