Skip to content

Schema (TypeScript)

Draft · v0.2.0 · 2026-06-30


This file is the canonical, authoritative type definition source for the IOA-ORM. When any other spec file (00-overview.md through 13-open-questions.md) defines a type, enum, or field name that conflicts with this file, this file takes precedence. Other files MAY reference these types but MUST NOT redefine them with different field names, types, or enum values.


  • All interfaces use readonly for fields that MUST NOT change after creation.
  • string IDs are UUIDv4 unless noted otherwise.
  • Timestamps are number (Unix epoch milliseconds).
  • Durations are number (milliseconds).
  • Optional fields are marked with ? and documented when they apply.
  • Discriminated unions use kind as the discriminant.
  • Normative language (MUST/SHOULD/MAY) appears in JSDoc comments.

The top-level canonical artifact. A published, versioned, complete specification of an oral exam.

/**
 * The canonical, versioned, executable specification of a published oral assessment.
 * This is the single source of truth consumed by runtime controller, Pipecat adapter,
 * and marking runtime.
 */
interface ExamRuntimePackage {
  /** Unique identifier for this exam. Stable across versions. */
  readonly examId: string;

  /** Semantic version of this package (e.g., "1.2.0"). MUST increment on any change. */
  readonly version: string;

  /** ISO 8601 timestamp of when this version was published. */
  readonly publishedAt: string;

  /** Human-readable metadata. */
  readonly metadata: ExamMetadata;

  /**
   * Assessment-theoretic profile grounded in Joughin's (1998) six dimensions
   * of oral assessment. Captures design parameters that determine what the exam
   * measures, how it is delivered, and what validity/reliability claims it makes.
   * OPTIONAL in v1 — when absent, defaults are inferred from node policies.
   */
  readonly assessmentProfile?: AssessmentProfile;

  /** Ordered list of runtime nodes forming the exam graph. */
  readonly nodes: readonly ExamRuntimeNode[];

  /** Global policies that apply across all nodes unless overridden. */
  readonly globalPolicies: GlobalRuntimePolicies;

  /** Registry of all evidence targets defined for this exam. */
  readonly evidenceTargets: readonly EvidenceTarget[];

  /** Question pools for randomized question delivery. Referenced by nodes. */
  readonly questionPools?: readonly QuestionPool[];

  /** Configuration hints for the Pipecat adapter. */
  readonly pipecatAdapter?: PipecatAdapterConfig;

  /**
   * Candidate-facing briefing information about the exam.
   * Describes format, duration, available commands, and preparation guidance.
   * @see Joughin (1998) Dimension 4: "Students need to know in advance what
   *   to expect of the shape of the assessment in order to prepare adequately."
   * @see Fenton (2025) Recommendation 1: "Students should be given information
   *   about the schedule and assessment criteria beforehand."
   */
  readonly candidateBriefing?: CandidateBriefing;
}

Human-readable information about the exam. Used for display, search, and audit.

/** Metadata describing the exam for human consumers. */
interface ExamMetadata {
  /** Exam title (e.g., "Biology 201 Oral Practical"). */
  readonly title: string;

  /** Subject or course code. */
  readonly subject: string;

  /** Institution or department. */
  readonly institution?: string;

  /** Academic term or semester. */
  readonly term?: string;

  /** Target language of the exam. */
  readonly language: string;

  /** Estimated total duration in milliseconds. */
  readonly estimatedDurationMs: number;

  /** Maximum allowed duration in milliseconds. Hard cap. */
  readonly maxDurationMs: number;

  /** Author(s) who designed this exam in the studio. */
  readonly authors?: readonly string[];

  /** Human-readable description of the exam. */
  readonly description?: string;

  /** Tags for search and categorization. */
  readonly tags?: readonly string[];

  /**
   * Formative, summative, or diagnostic purpose.
   * Affects whether evidence contributes to grades, whether candidate receives
   * real-time feedback, and whether the exam is recorded for review.
   * @see Fenton (2025) on formative vs summative oral assessments.
   */
  readonly assessmentPurpose?: "formative" | "summative" | "diagnostic";

  /**
   * Expected number of candidates for this exam session.
   * Used for scalability planning (question pool sizing, parallel grading hints).
   * @see Bayley et al. (2024) on scaling ConVOEs to 600+ students.
   */
  readonly expectedCandidateCount?: number;

  /**
   * Whether the exam is open-book, closed-book, or restricted.
   * Affects cognitive demands (less memorization in open-book) and anxiety levels.
   * @see Fenton (2025) Recommendation 13: "Plan if the assessment will be
   *   open book or closed book."
   * @see Sayre (2014) on open-book assessment design.
   */
  readonly bookPolicy?: "open" | "closed" | "restricted";
}

Discriminant for node types. Determines default behaviors and valid policy combinations.

/**
 * The type of a runtime node.
 * Determines default behaviors, valid policies, and how the runtime controller
 * manages the node lifecycle.
 */
type ExamRuntimeNodeKind =
  /** A direct question posed to the candidate. */
  | "question"
  /** A scenario presentation (read aloud, display material, etc.). */
  | "scenario"
  /** A structured task (role-play, problem-solving, demonstration). */
  | "task"
  /** An open-ended discussion segment. */
  | "discussion"
  /** Pre-assessment rapport building. NOT assessed. */
  | "warmup"
  /** Closing segment. May include summary or feedback. */
  | "wrapup"
  /** Conditional routing node. No candidate interaction. */
  | "branch"
  /**
    * Pre-exam identity verification node. Candidate presents ID.
    * NOT assessed. Emits identity_verified or identity_failed events.
    * @see Akimov & Malin (2020): "each student had to show the examiner a current
    *   student ID card or a government-issued document."
    * @see Fenton (2025): "ensure the student presents their identification card."
    */
  | "identity_check";

A single unit in the exam graph. Contains local policies that override globals.

/**
 * A discrete unit of the exam flow.
 * Each node has a type, local policies, evidence targets, and transition rules.
 * Local policies override global policies for this node only.
 */
interface ExamRuntimeNode {
  /** Unique identifier within the package. */
  readonly nodeId: string;

  /** The type of this node. */
  readonly kind: ExamRuntimeNodeKind;

  /**
   * Base content or prompt seed for this node.
   * This is NOT the full system prompt — it is the content the Pipecat adapter
   * and runtime controller use to construct the examiner's behavior.
   * MAY contain template variables (e.g., {{candidateName}}).
   */
  readonly promptSeed: string;

  /** Display order in the exam flow. Used for linear progression. */
  readonly order: number;

  /** Human-readable label for this node (e.g., "Q1: Photosynthesis"). */
  readonly label?: string;

  /** Maximum time allowed in this node. Overrides global per-node default. */
  readonly timeBudgetMs?: number;

  /** Local completion policy. Overrides global default. */
  readonly completionPolicy?: CompletionPolicy;

  /** Local follow-up policy. Overrides global default. */
  readonly followUpPolicy?: FollowUpPolicy;

  /** Local recovery policy. Overrides global default. */
  readonly recoveryPolicy?: RecoveryPolicy;

  /**
   * If this node draws from a question pool, the pool reference.
   * When set, `promptSeed` serves as a template with {{variantPromptSeed}} placeholder.
   * @see Akimov & Malin (2020) on question banking for inter-case reliability.
   */
  readonly questionPoolId?: string;

  /** Evidence targets assessed at this node. References into package-level targets. */
  readonly evidenceTargetIds?: readonly string[];

  /** Transition rules from this node to successor nodes. */
  readonly transitions: readonly TransitionPolicy[];

  /** Which candidate commands are valid at this node. */
  readonly candidateCommands?: CandidateCommandPolicy;

  /** Whether this node produces assessed evidence. warmup/wrapup are typically false. */
  readonly isAssessed: boolean;

  /** Optional context overrides for the AI examiner at this node. */
  readonly contextOverride?: Partial<ContextPolicy>;

  /**
   * Whether this warmup node is a practice session (not assessed).
   * Only applicable when kind = "warmup".
   * Practice sessions help reduce anxiety by familiarizing candidates with the format.
   * @see Fenton (2025): "the anxiety some students experience may be linked to
   *   the fact that they are unfamiliar with the format."
   * @see Akimov & Malin (2020): 100% of students were nervous; practice helped.
   */
  readonly isPractice?: boolean;

  /**
   * Anxiety mitigation strategy for this warmup node.
   * Only applicable when kind = "warmup".
   * @see Fenton (2025) Recommendation 8: anxiety management.
   * @see Akimov & Malin (2020): anxiety as a major concern in oral assessment.
   */
  readonly anxietyMitigation?:
    /** Gradual exposure: start with easy questions, build up to assessed content. */
    | "graduated_exposure"
    /** Breathing exercise: guide the candidate through a calming exercise. */
    | "breathing_exercise"
    /** Format familiarization: explain the exam structure and available commands. */
    | "format_familiarization"
    /** Combined: all of the above in sequence. */
    | "combined";
}

Mutable per-session state tracked by the runtime controller. NOT persisted as a log — this is working memory.

/**
 * The mutable state of a runtime session.
 * Maintained by the runtime controller. Updated on every turn, command, and transition.
 * This is working memory — the authoritative persistent outputs are the
 * EvidenceLedger and RuntimeEvent log.
 */
interface RuntimeStateSchema {
  /** Current status of the session. */
  readonly status: SessionStatus;

  /** ID of the node the session is currently in. */
  readonly currentNodeId: string;

  /** Number of candidate turns in the current node. */
  readonly currentNodeTurnCount: number;

  /** Number of follow-ups issued by the examiner in the current node. */
  readonly currentNodeFollowUpCount: number;

  /** Total elapsed time for the session in milliseconds. */
  readonly globalElapsedMs: number;

  /** Elapsed time in the current node in milliseconds. */
  readonly nodeElapsedMs: number;

  /** History of candidate commands issued in this session. */
  readonly candidateCommandHistory: readonly CandidateCommandRecord[];

  /** Map of evidence target ID to number of signals received. */
  readonly evidenceCoverage: Readonly<Record<string, number>>;

  /** Recovery attempts in the current node. */
  readonly currentNodeRecoveryAttempts: readonly RecoveryAttemptRecord[];

  /** Index of the last processed turn. */
  readonly lastTurnIndex: number;

  /** Timestamp of the last state update. */
  readonly lastUpdatedAt: number;
}

type SessionStatus =
  | "active"
  | "paused"
  | "completed"
  | "terminated";

interface CandidateCommandRecord {
  readonly command: CandidateCommandType;
  readonly turnIndex: number;
  readonly timestampMs: number;
  readonly handled: boolean;
  readonly response?: string;
}

interface RecoveryAttemptRecord {
  readonly scenario: RecoveryScenario;
  readonly attemptNumber: number;
  readonly timestampMs: number;
  readonly action: RecoveryEscalation;
  readonly successful: boolean;
}

A rubric-aligned definition of what the exam is trying to assess.

/**
 * Defines what the exam is trying to assess at a conceptual level.
 * Linked to rubric criteria in the marking model.
 * Referenced by nodes and by evidence signals.
 */
interface EvidenceTarget {
  /** Unique identifier within the package. */
  readonly targetId: string;

  /** Human-readable label (e.g., "Explain photosynthesis mechanism"). */
  readonly label: string;

  /** Detailed description of what constitutes valid evidence. */
  readonly description: string;

  /** Links to rubric criteria IDs in the marking model. */
  readonly rubricCriteriaIds: readonly string[];

  /**
   * The dimension of oral assessment this target addresses.
   * Joughin (1998) identifies four primary content types. The "metacognitive"
   * dimension captures self-correction and reasoning process quality
   * (Fenton, 2025).
   */
  readonly evidenceDimension:
    | "knowledge_understanding"
    | "applied_problem_solving"
    | "interpersonal_competence"
    | "intrapersonal_quality"
    | "metacognitive"
    | "integrated_practice";

  /**
   * Bloom's Taxonomy cognitive level this target assesses.
   * Enables cognitive-level-aware validation, follow-up escalation, and marking.
   * Optional — when omitted, the target is not classified by cognitive level.
   * @see Bloom, B.S. (1956). Taxonomy of Educational Objectives.
   * @see Fenton (2025): "Generative AI tools have been found to perform well
   *   at the lower levels of Bloom's taxonomy but struggle at the create level."
   */
  readonly cognitiveLevel?: BloomLevel;

  /**
   * Whether this target is transversal (session-wide) or scoped to specific nodes.
   * Transversal targets (e.g., communication quality, critical thinking) are assessed
   * across ALL nodes, not scoped to specific ones.
   * Joughin (1998): interpersonal competence is "not skills per se but rather skills
   * exhibited in relation to a clinical situation or problem solving exercise."
   */
  readonly transversal: boolean;

  /** The node(s) where this target is expected to be evidenced. Empty for transversal targets. */
  readonly expectedNodeIds: readonly string[];

  /**
   * Aggregation method for transversal targets.
   * - "holistic": marker judges overall quality from the full session
   * - "best_of": highest signal quality across nodes
   * - "trajectory": assess whether quality improved over the session
   * Ignored for non-transversal targets.
   */
  readonly aggregationMethod?: "holistic" | "best_of" | "trajectory";

  /**
   * Minimum confidence threshold for a signal to count as "satisfied."
   * Range: 0.0 to 1.0. Default: 0.7.
   */
  readonly requiredConfidence: number;

  /**
   * Maximum number of signals this target can receive.
   * Prevents over-counting from repeated follow-ups.
   * If omitted, unlimited.
   */
  readonly maxSignals?: number;

  /** Minimum number of positive signals needed to consider this target "covered". */
  readonly minPositiveSignals: number;

  /** Whether this target MUST be satisfied for the exam to be considered complete. */
  readonly isRequired: boolean;

  /** Weight of this target in the overall assessment. Range: 0.0 to 1.0. */
  readonly weight: number;
}

A runtime-emitted record that an evidence target was demonstrated.

/**
 * A record that a specific evidence target was (or was not) demonstrated.
 * Produced by the AI examiner during conversation. Written to the ledger immediately.
 * This is NOT derived from transcript — it is a first-class runtime artifact.
 */
interface EvidenceSignal {
  /** Unique identifier for this signal. */
  readonly signalId: string;

  /** The session this signal belongs to. */
  readonly sessionId: string;

  /** Which node the evidence was gathered in. */
  readonly nodeId: string;

  /** Transcript turns that support this signal. */
  readonly turnIds: readonly string[];

  /** Which evidence target(s) this signal addresses. */
  readonly targetIds: readonly string[];

  /**
   * The dimension of oral assessment this signal addresses.
   * Joughin (1998) identifies four primary content types. The "metacognitive"
   * dimension captures self-correction and reasoning process quality
   * (Fenton, 2025).
   */
  readonly evidenceDimension:
    | "knowledge_understanding"
    | "applied_problem_solving"
    | "interpersonal_competence"
    | "intrapersonal_quality"
    | "metacognitive";

  /**
   * Classification of the evidence.
   *
   * The taxonomy extends beyond knowledge-correctness to capture process quality.
   * Fenton (2025): oral assessments reveal "the process of learning rather than
   * the output" and allow students to "reflect on their choices and have the
   * chance to self-correct."
   *
   * - positive:           Correct and complete evidence
   * - partial:            Partially correct or incomplete
   * - absent:             No evidence for this target
   * - misconception:      Demonstrates a misunderstanding
   * - flawed_reasoning:   Right answer with incorrect justification
   * - process_positive:   Good reasoning process, regardless of final answer
   * - process_negative:   Poor reasoning process
   * - self_correction:    Candidate identified and corrected their own error
   */
  readonly signalKind:
    | "positive"
    | "partial"
    | "absent"
    | "misconception"
    | "flawed_reasoning"
    | "process_positive"
    | "process_negative"
    | "self_correction";

  /** Free-text description for human reviewers. */
  readonly description: string;

  /**
   * Confidence that the target was demonstrated.
   * Range: 0.0 (no evidence) to 1.0 (certain).
   */
  readonly confidence: number;

  /**
   * STT confidence summary for the underlying transcript turns.
   * Signal confidence is epistemically dependent on transcript quality.
   */
  readonly sttConfidenceSummary: {
    readonly min: number;
    readonly max: number;
    readonly mean: number;
    readonly turnCount: number;
  };

  /** Who proposed this signal. */
  readonly proposedBy: "llm_analysis" | "runtime_heuristic" | "manual_marker";

  /** Whether the runtime controller has validated this signal. */
  readonly approved: boolean;

  /** ISO-8601 timestamp of signal creation. */
  readonly createdAt: string;

  /** ISO-8601 timestamp of approval (null if not yet approved). */
  readonly approvedAt: string | null;

  /** Timestamp when the signal was emitted (Unix epoch ms). */
  readonly timestampMs: number;

  /** Schema version. */
  readonly schemaVersion: "1";
}

The authoritative collection of evidence signals for a session.

/**
 * The structured, authoritative collection of all evidence signals for a session.
 * First-class output consumed by the marking runtime.
 * NOT a transcript derivative — maintained in real-time by the runtime controller.
 */
interface EvidenceLedger {
  /** The session this ledger belongs to. */
  readonly sessionId: string;

  /** The exam ID. */
  readonly examId: string;

  /** All evidence targets defined for this exam. */
  readonly targets: readonly EvidenceTarget[];

  /** All transcript turns, in chronological order. */
  readonly turns: readonly TranscriptTurn[];

  /** All evidence signals (approved and pending). */
  readonly signals: readonly EvidenceSignal[];

  /** All detected evidence gaps. */
  readonly gaps: readonly EvidenceGap[];

  /** Summary statistics. */
  readonly summary: {
    readonly totalTurns: number;
    readonly totalSignals: number;
    readonly signalsByKind: Readonly<Record<string, number>>;
    readonly signalsByDimension: Readonly<Record<string, number>>;
    readonly targetsFullyCovered: number;
    readonly targetsPartiallyCovered: number;
    readonly targetsWithGaps: number;
    readonly mandatoryGaps: number;
    readonly averageConfidence: number;
    readonly averageSttConfidence: number;
  };

  /**
   * Optional reference to the session recording.
   * Akimov & Malin (2020): recording enables post-hoc human review.
   */
  readonly recordingRef?: {
    readonly audioUrl?: string;
    readonly videoUrl?: string;
    readonly availableForModeration: boolean;
    readonly candidateConsented: boolean;
    readonly retentionPolicy: {
      readonly retainUntil: string;
      readonly deleteAfterReview: boolean;
    };
  };

  /**
   * Optional moderation record.
   * Akimov & Malin (2020): all oral exams were "moderated by another
   * finance academic" for intra-rater reliability.
   */
  readonly moderationRecord?: {
    readonly moderatorId: string;
    readonly reviewedAt: string;
    readonly agreementRate: number;
    readonly overriddenSignalIds: readonly string[];
    readonly addedSignals: readonly EvidenceSignal[];
    readonly notes?: string;
  };

  /** ISO-8601 timestamp of ledger finalisation. */
  readonly finalisedAt: string;

  /** Schema version. */
  readonly schemaVersion: "1";
}

Rules for when a node is considered “done.”

/**
 * Rules governing when a node is considered complete.
 * The runtime controller evaluates this after every turn.
 * All specified conditions MUST be met for completion (AND logic),
 * unless `anyConditionSufficient` is true.
 */
interface CompletionPolicy {
  /** Minimum candidate turns before completion is possible. Default: 1. */
  readonly minTurns?: number;

  /** Hard cap on total turns. Forces completion when reached. */
  readonly maxTurns?: number;

  /** Specific evidence target IDs that MUST have satisfied signals. */
  readonly requiredEvidenceTargetIds?: readonly string[];

  /** Minimum number of evidence targets that must be satisfied. */
  readonly requiredEvidenceCount?: number;

  /** Maximum time in this node in milliseconds. Forces completion on expiry. */
  readonly timeBudgetMs?: number;

  /**
   * Whether the examiner can explicitly signal completion.
   * If false, only automatic conditions can complete the node.
   */
  readonly allowExplicitComplete?: boolean;

  /**
   * If true, any single condition being met is sufficient for completion.
   * If false (default), ALL conditions must be met.
   */
  readonly anyConditionSufficient?: boolean;

  /**
   * What happens when time budget expires.
   * "force_transition" — immediately move to next node.
   * "warn_and_extend" — warn candidate, allow one extension.
   * "terminate" — end the session.
   */
  readonly timeoutBehavior?: "force_transition" | "warn_and_extend" | "terminate";
}

Rules for examiner follow-up behavior within a node.

/**
 * Rules governing the AI examiner's follow-up behavior within a node.
 * The runtime controller enforces these limits. The AI examiner generates
 * follow-ups freely within the boundaries.
 */
interface FollowUpPolicy {
  /**
   * Hard cap on follow-ups per node. MUST NOT be exceeded.
   * When reached, the runtime controller forces escalation per `escalationRule`.
   */
  readonly maxFollowUps: number;

  /** Style of follow-up the examiner should use. */
  readonly followUpStyle?: "probing" | "scaffolding" | "clarifying" | "redirecting" | "free";

  /** Minimum time between follow-ups in milliseconds. */
  readonly minIntervalMs?: number;

  /**
   * If true, follow-ups are only issued when an evidence target in this node
   * is unsatisfied. Prevents unnecessary probing.
   */
  readonly requireEvidenceGap?: boolean;

  /**
   * Patterns the examiner MUST NOT use in follow-ups.
   * Checked post-generation. If matched, the follow-up is discarded and
   * the examiner regenerates (up to a retry limit).
   */
  readonly forbiddenFollowUpPatterns?: readonly string[];

  /** What to do when maxFollowUps is reached. */
  readonly escalationRule?: FollowUpEscalation;

  /**
   * Allowed prompting levels for this node.
   * Based on Pearce & Chiavaroli (2020) prompting taxonomy, cited in Fenton (2025).
   * Constrains the examiner's follow-up moves to maintain assessment fairness.
   * Defaults to ["present_task", "probing"] if omitted.
   */
  readonly allowedPromptingLevels?: readonly PromptingLevel[];

  /**
   * Whether prompting must be consistent across candidates taking this exam.
   * When true, the runtime controller MUST track and enforce that each candidate
   * receives the same prompting style (not necessarily the same wording).
   * Default: false.
   * @see Fenton (2025), citing Pearce & Chiavaroli (2020): consistency principle.
   */
  readonly requireConsistentPrompting?: boolean;

  /**
   * Whether the candidate should be informed about prompting style in advance.
   * Supports transparency — candidates know what to expect.
   * @see Fenton (2025), citing Pearce & Chiavaroli (2020): transparency principle.
   */
  readonly disclosePromptingStyle?: boolean;

  /**
   * Maximum scaffolding intensity for this node (0–3).
   * 0 = no scaffolding (independent answer). 3 = heavy scaffolding.
   * The amount of scaffolding provided is itself evidence of candidate competence.
   * @see Fenton (2025): "educators have the flexibility to simplify questions
   *   or prompt students who are struggling."
   * @see Vygotsky's Zone of Proximal Development (ZPD) theory.
   */
  readonly scaffoldingBudget?: number;

  /**
   * Guiding principles for how prompting is applied.
   * Based on Pearce & Chiavaroli (2020), cited in Fenton (2025).
   * These principles constrain HOW prompting is used, not just WHICH levels are allowed.
   * @see Pearce & Chiavaroli (2020); Fenton (2025) p. 434.
   */
  readonly promptingPrinciples?: {
    /** Prompts must neither discourage nor reassure the candidate. */
    readonly neutrality?: boolean;
    /** Prompts must be consistent across candidates. */
    readonly consistency?: boolean;
    /** Candidate should be informed about prompting style in advance. */
    readonly transparency?: boolean;
    /** Examiner must reflect on and adjust prompting practice. */
    readonly reflexivity?: boolean;
  };

  /**
   * Strategy for escalating cognitive depth through follow-ups.
   * Controls whether follow-ups aim to elicit higher-order thinking.
   * @see Bloom (1956); Fenton (2025) on higher-order thinking in oral assessment.
   */
  readonly cognitiveEscalationStrategy?:
    /** Stay at the same cognitive level as the initial response. */
    | "maintain"
    /** Probe for higher-order thinking (e.g., Remember → Understand → Apply). */
    | "escalate"
    /** Provide scaffolding to help the candidate reach the target cognitive level. */
    | "scaffold";
}

/**
 * Prompting taxonomy based on Pearce & Chiavaroli (2020), cited in Fenton (2025).
 * Ranges from neutral presentation to leading guidance.
 * Guiding principles: neutrality, consistency, transparency, reflexivity.
 */
type PromptingLevel =
  /** Simply state the question/task. Neutral. */
  | "present_task"
  /** Repeat the question for the candidate. */
  | "repeat_info"
  /** Ask if the candidate understands the question. */
  | "clarifying"
  /** Ask for deeper explanation or elaboration. */
  | "probing"
  /** Guide toward the correct answer (use sparingly, with caution). */
  | "leading";

type FollowUpEscalation =
  /** Move to the next node. */
  | "transition"
  /** Generate a wrap-up utterance and move on. */
  | "wrap_up"
  /** Terminate the session. */
  | "terminate"
  /** Log a warning and continue (allows the AI to attempt closure). */
  | "warn";

Rules for moving between nodes.

/**
 * A single transition rule from one node to another.
 * Evaluated by the runtime controller after each turn and on policy triggers.
 * When multiple transitions are eligible, the highest priority wins.
 */
interface TransitionPolicy {
  /** The destination node ID. */
  readonly targetNodeId: string;

  /** Condition that must be true for this transition to fire. */
  readonly condition: TransitionCondition;

  /**
   * Priority for tie-breaking. Higher numbers win.
   * Default: 0.
   */
  readonly priority?: number;

  /**
   * If true, this transition overrides completion policy.
   * Used for timeout-forced transitions and error recovery.
   */
  readonly isForced?: boolean;

  /**
   * Optional prompt seed for the examiner to generate a natural
   * transition utterance (bridge). If omitted, the runtime
   * controller may inject a generic transition.
   */
  readonly bridgePrompt?: string;
}

type TransitionCondition =
  /** Always eligible. Typically used as a fallback. */
  | { readonly type: "always" }
  /** Specific evidence targets must be satisfied. */
  | { readonly type: "evidence_satisfied"; readonly targetIds: readonly string[] }
  /** Minimum candidate turn count must be reached. */
  | { readonly type: "turn_count_reached"; readonly minTurns: number }
  /** Time threshold must be crossed. */
  | { readonly type: "time_elapsed"; readonly minMs: number }
  /** Candidate must have issued a specific command. */
  | { readonly type: "candidate_command"; readonly command: CandidateCommandType }
  /** A policy limit must have been reached. */
  | { readonly type: "policy_escalation"; readonly policy: "follow_up_limit" | "time_budget" | "recovery_limit" };

Rules for handling anomalies during the session.

/**
 * A single recovery rule for a specific anomaly scenario.
 * The runtime controller matches anomalies to recovery rules and executes
 * the prescribed sequence. The AI examiner generates the recovery utterance.
 */
interface RecoveryPolicy {
  /** Which anomaly this rule addresses. */
  readonly scenario: RecoveryScenario;

  /** Maximum recovery attempts before escalation. */
  readonly maxAttempts: number;

  /** What to do when max attempts are exhausted. */
  readonly escalation: RecoveryEscalation;

  /** Prompt seed for the examiner's recovery utterance. */
  readonly recoveryPrompt?: string;

  /** Minimum wait before next recovery attempt in milliseconds. */
  readonly cooldownMs?: number;

  /** Time to wait before triggering recovery (e.g., silence detection). */
  readonly detectionThresholdMs?: number;
}

type RecoveryScenario =
  | "silence"
  | "unclear_answer"
  | "off_topic"
  | "anxiety"
  | "interruption"
  | "network_issue"
  | "repetition_loop";

type RecoveryEscalation =
  /** Retry with the same approach. */
  | "retry"
  /** Rephrase the question. */
  | "rephrase"
  /** Skip this node and move to the next. */
  | "skip_node"
  /** Pause the session (candidate can resume later). */
  | "pause_session"
  /** Terminate the session. */
  | "terminate";

Rules for which candidate commands are valid at a node.

/**
 * Defines which candidate commands are recognized and how they are handled
 * at a specific node. Commands not in this list are ignored (treated as
 * regular candidate speech).
 */
interface CandidateCommandPolicy {
  /** Commands explicitly allowed at this node. */
  readonly allowed: readonly AllowedAction[];

  /** Commands explicitly forbidden at this node. */
  readonly forbidden?: readonly ForbiddenAction[];
}

/**
 * A candidate command that is recognized and handled at this node.
 */
interface AllowedAction {
  /** The command type. */
  readonly command: CandidateCommandType;

  /**
   * Maximum times this command can be used in this node.
   * If omitted, unlimited.
   */
  readonly maxUses?: number;

  /**
   * How the runtime controller handles this command.
   * "inject_response" — the controller generates/replays a response.
   * "notify_examiner" — the examiner is informed and can adapt.
   * "pause" — the session is paused.
   * "skip" — the node is skipped.
   */
  readonly handling: "inject_response" | "notify_examiner" | "pause" | "skip";

  /**
   * For "inject_response" handling: the response template.
   * MAY contain {{turnText}} to reference the last examiner turn.
   */
  readonly responseTemplate?: string;
}

/**
 * A command that is explicitly forbidden at this node.
 * If detected, the runtime controller emits a policy_violation event.
 */
interface ForbiddenAction {
  /** The forbidden command type. */
  readonly command: CandidateCommandType;

  /** Reason for forbidding (for audit log). */
  readonly reason: string;

  /** What happens if the candidate attempts this command. */
  readonly onViolation: "ignore" | "inform" | "warn";
}

type CandidateCommandType =
  | "repeat"
  | "clarification"
  /** Candidate asks examiner to rephrase (distinct from repeat — signals active engagement). */
  | "request_rephrase"
  | "pause"
  | "raise_hand"
  | "skip"
  | "volume_up"
  | "volume_down"
  | "language_switch"
  /** Candidate signals they need a moment to think. Assessment-significant. */
  | "thinking_aloud";

An immutable record of a significant state change.

/**
 * An immutable record of a significant state change during a session.
 * Events are the audit trail. Every event has a type, timestamp, and payload.
 * Events MUST be emitted by the runtime controller — they are not optional telemetry.
 */
interface RuntimeEvent {
  /** Unique event ID. */
  readonly eventId: string;

  /** The session this event belongs to. */
  readonly sessionId: string;

  /** Event type discriminator. */
  readonly type: RuntimeEventType;

  /** Unix epoch milliseconds when the event occurred. */
  readonly timestampMs: number;

  /** The node active when this event occurred, if applicable. */
  readonly nodeId?: string;

  /** The turn index associated with this event, if applicable. */
  readonly turnIndex?: number;

  /** Event-specific payload. */
  readonly payload: RuntimeEventPayload;
}

type RuntimeEventType =
  // Lifecycle
  | "session_started"
  | "session_paused"
  | "session_resumed"
  | "session_completed"
  | "session_terminated"
  // Node
  | "node_entered"
  | "node_exited"
  | "node_timeout"
  // Turn
  | "examiner_turn"
  | "candidate_turn"
  | "turn_completed"
  // Evidence
  | "evidence_signal_emitted"
  | "evidence_target_satisfied"
  | "evidence_target_missed"
  // Command
  | "candidate_command_received"
  | "candidate_command_processed"
  // Policy
  | "follow_up_limit_reached"
  | "time_budget_warning"
  | "time_budget_exceeded"
  | "transition_forced"
  | "recovery_triggered"
  | "recovery_succeeded"
  | "recovery_failed"
  | "policy_violation"
  // Agent
  | "agent_action_allowed"
  | "agent_action_blocked"
  // Assessment-significant moments (Fenton 2025)
  | "hesitation_detected"
  | "self_correction_detected";

/**
 * Event-specific payloads. Discriminated by RuntimeEventType.
 */
type RuntimeEventPayload =
  | SessionLifecyclePayload
  | NodeEventPayload
  | TurnEventPayload
  | EvidenceEventPayload
  | CommandEventPayload
  | PolicyEventPayload
  | AgentEventPayload;

interface SessionLifecyclePayload {
  readonly reason?: string;
  readonly totalTurns?: number;
  readonly totalElapsedMs?: number;
}

interface NodeEventPayload {
  readonly nodeId: string;
  readonly nodeKind: ExamRuntimeNodeKind;
  readonly fromNodeId?: string;
  readonly transitionCondition?: string;
}

interface TurnEventPayload {
  readonly role: "examiner" | "candidate";
  readonly text: string;
  readonly isFollowUp: boolean;
  readonly followUpIndex?: number;
  readonly durationMs?: number;
  readonly sttConfidence?: number;
  /**
   * Optional rapport-building move by the examiner.
   * Logged for quality assurance; does NOT count against follow-up limits.
   * @see Akimov & Malin (2020): "the examiner attempted to make students feel
   *   comfortable and at ease."
   */
  readonly rapportMove?: "encouragement" | "acknowledgement" | "reassurance" | "humor" | "none";
  /**
   * Scaffolding intensity of this turn (0–3).
   * 0 = no scaffolding (independent answer). 3 = heavy scaffolding.
   * Tracks the degree of examiner support provided, which is itself evidence.
   * @see Fenton (2025) on scaffolding during oral assessment.
   */
  readonly scaffoldingIntensity?: number;
}

interface EvidenceEventPayload {
  readonly signal?: EvidenceSignal;
  readonly targetId?: string;
  readonly confidence?: number;
}

/**
 * Payload for hesitation_detected events.
 * @see Fenton (2025): hesitation patterns reveal reasoning processes.
 */
interface HesitationDetectedPayload {
  readonly nodeId: string;
  readonly turnId: number;
  readonly durationMs: number;
  readonly context: "after_question" | "mid_response" | "before_conclusion";
}

/**
 * Payload for self_correction_detected events.
 * @see Fenton (2025): "students can reflect on their choices and have the
 *   chance to self-correct."
 */
interface SelfCorrectionDetectedPayload {
  readonly nodeId: string;
  readonly turnIds: readonly number[];
  readonly originalClaim: string;
  readonly correctedClaim: string;
  readonly confidence: number;
}

interface CommandEventPayload {
  readonly command: CandidateCommandType;
  readonly handled: boolean;
  readonly response?: string;
}

interface PolicyEventPayload {
  readonly policyType: string;
  readonly limit?: number;
  readonly current?: number;
  readonly action: string;
  readonly details?: string;
  /**
   * For hesitation events: how long the candidate paused (ms).
   * For self-correction events: the original and corrected claims.
   */
  readonly assessmentContext?: HesitationDetectedPayload | SelfCorrectionDetectedPayload;
}

interface AgentEventPayload {
  readonly actionType: string;
  readonly allowed: boolean;
  readonly reason?: string;
  readonly violation?: string;
}

A single attributed utterance in the conversation.

/**
 * A single attributed utterance in the exam conversation.
 * Richer than raw STT output — carries node context, timing, and semantic metadata.
 * Transcript turns are the raw material; evidence signals are the structured interpretation.
 */
interface TranscriptTurn {
  /** Sequential index within the session. 0-based. */
  readonly turnIndex: number;

  /** Who produced this utterance. */
  readonly role: "examiner" | "candidate" | "system";

  /** The transcribed or generated text. */
  readonly text: string;

  /** Which node this turn occurred in. */
  readonly nodeId: string;

  /** Timestamp when the turn started. */
  readonly timestampMs: number;

  /** Duration of the turn in milliseconds. */
  readonly durationMs: number;

  /** Whether this examiner turn was a follow-up. */
  readonly isFollowUp: boolean;

  /** If follow-up, which follow-up index (0-based). */
  readonly followUpIndex?: number;

  /** If a candidate command was detected in this turn. */
  readonly candidateCommandDetected?: CandidateCommandType;

  /** STT confidence score (for candidate turns). Range: 0.0 to 1.0. */
  readonly sttConfidence?: number;

  /** Whether a recovery action was taken during this turn. */
  readonly recoveryAction?: RecoveryScenario;
}

/**
 * Records an evidence gap: a mandatory target that was not sufficiently evidenced.
 */
interface EvidenceGap {
  /** The target that was underserved. */
  readonly targetId: string;

  /** The node where evidence was expected. */
  readonly nodeId: string;

  /** Number of positive signals collected. */
  readonly positiveSignalsCollected: number;

  /** Minimum required. */
  readonly minPositiveSignalsRequired: number;

  /** How the gap was detected. */
  readonly detectedBy: "runtime_check" | "marking_pipeline" | "manual_review";

  /** Whether the gap was addressed via follow-up. */
  readonly addressedByFollowUp: boolean;

  /** Whether a recovery was attempted. */
  readonly addressedByRecovery: boolean;
}

Rules for operational data emission.

/**
 * Rules governing what operational data is emitted and where.
 * Controls the granularity and destinations of runtime telemetry.
 */
interface TelemetryPolicy {
  /** Emit events for every turn. Default: true. */
  readonly emitTurnEvents?: boolean;

  /** Emit events for evidence signals. Default: true. SHOULD always be true. */
  readonly emitEvidenceEvents?: boolean;

  /** Emit events for state transitions. Default: true. */
  readonly emitStateTransitions?: boolean;

  /**
   * Emit events for policy violations.
   * MUST always be true. This field exists for documentation, not configuration.
   */
  readonly emitPolicyViolations: true;

  /** Sampling rate for high-frequency events (0.0 to 1.0). Default: 1.0. */
  readonly samplingRate?: number;

  /** Where events are delivered. */
  readonly destinations?: readonly TelemetryDestination[];
}

type TelemetryDestination =
  | "event_store"
  | "analytics"
  | "debug_console"
  | "livekit_data_channel";

Rules for what context the AI examiner can access.

/**
 * Rules governing what exam context is injected into the AI examiner's prompt.
 * This is a critical agent boundary mechanism: what the examiner doesn't see,
 * it can't leak or misuse.
 */
interface ContextPolicy {
  /** Whether the examiner can see rubric criteria. Default: false. */
  readonly includeRubric?: boolean;

  /** Whether the examiner can see transcript from prior nodes. Default: false. */
  readonly includePreviousNodes?: boolean;

  /**
   * Whether the examiner can see current evidence coverage.
   * Useful for adaptive follow-up. Default: false.
   */
  readonly includeEvidenceStatus?: boolean;

  /** Whether the examiner can see prior session data for this candidate. Default: false. */
  readonly includeCandidateHistory?: boolean;

  /** Maximum tokens for context injection. */
  readonly maxContextTokens?: number;

  /** Fields that MUST NOT appear in the examiner's context. */
  readonly redactedFields?: readonly string[];
}

Policies that apply across the entire exam unless overridden at node level.

/**
 * Global policies for the entire exam.
 * These are defaults that apply to all nodes unless a node provides
 * a local override.
 */
interface GlobalRuntimePolicies {
  /** Default completion policy for nodes without a local override. */
  readonly defaultCompletion?: CompletionPolicy;

  /** Default follow-up policy for nodes without a local override. */
  readonly defaultFollowUp?: FollowUpPolicy;

  /** Default recovery policies for the entire exam. */
  readonly recoveryPolicies?: readonly RecoveryPolicy[];

  /** Default transition policy (fallback). */
  readonly defaultTransition?: TransitionPolicy;

  /** Telemetry configuration. */
  readonly telemetry: TelemetryPolicy;

  /** Context policy for the AI examiner (global). */
  readonly context: ContextPolicy;

  /**
   * Global agent boundary: actions the examiner is NEVER allowed to perform,
   * regardless of node-level policy.
   */
  readonly forbiddenActions: readonly ForbiddenAction[];

  /**
   * Global time budget for the entire exam in milliseconds.
   * When exceeded, the session MUST be terminated.
   */
  readonly globalTimeBudgetMs: number;

  /**
   * What happens when the global time budget is exceeded.
   */
  readonly globalTimeoutBehavior: "force_complete" | "terminate";

  /**
   * Whether communication style (accent, fluency, verbal confidence) is a
   * declared learning outcome. When false, the LLM MUST NOT penalise or
   * comment on communication style.
   * @see Fenton (2025): equity requires not penalising communication style
   *   unless it is a specific learning outcome.
   */
  readonly communicationStyleIsLearningOutcome?: boolean;

  /**
   * Silence detection threshold in milliseconds.
   * When no candidate speech is detected within this duration, the runtime
   * triggers a silence recovery prompt.
   */
  readonly silenceTimeoutMs?: number;

  /**
   * Maximum number of silence prompts before marking node as best-effort.
   * Default: 2.
   */
  readonly maxSilencePrompts?: number;

  /**
   * Maximum candidate input length in characters.
   * Inputs exceeding this are truncated from the beginning.
   */
  readonly maxCandidateInputLength?: number;

  /**
   * Whether welfare checks are enabled for candidate distress detection.
   * When true, the runtime may offer welfare pauses when distress is detected.
   */
  readonly welfareCheckEnabled?: boolean;

  /**
   * Global conversational style policy for the AI examiner.
   * Controls tone, warmth, rapport-building, and conversational pacing.
   * @see Fenton (2015): oral assessment as "conversation rather than interrogatory."
   */
  readonly conversationalStylePolicy?: ConversationalStylePolicy;

  /**
   * Formative feedback policy for the AI examiner.
   * Controls what learning-oriented feedback the examiner can give during formative assessments.
   * Only applicable when assessmentPurpose is "formative".
   * @see Fenton (2025): oral assessment as "enhancer of student learning."
   */
  readonly formativeFeedbackPolicy?: FormativeFeedbackPolicy;

  /**
   * Time extension in milliseconds granted when candidate anxiety is detected.
   */
  readonly anxietyTimeExtensionMs?: number;

  /**
   * Maximum time to wait for candidate reconnection before aborting.
   */
  readonly reconnectTimeoutMs?: number;
}

Configuration hints for the Pipecat adapter layer.

/**
 * Configuration hints for the Pipecat adapter.
 * These do NOT change the IR's semantics — they guide how the adapter
 * compiles the IR into Pipecat-specific configuration.
 */
interface PipecatAdapterConfig {
  /** Target Pipecat version this config is compatible with. */
  readonly targetPipecatVersion?: string;

  /** Default STT configuration. */
  readonly sttConfig?: {
    readonly provider: string;
    readonly language: string;
    readonly model?: string;
  };

  /** Default LLM configuration for the examiner. */
  readonly llmConfig?: {
    readonly provider: string;
    readonly model: string;
    readonly temperature?: number;
    readonly maxTokens?: number;
  };

  /** Default TTS configuration. */
  readonly ttsConfig?: {
    readonly provider: string;
    readonly voice: string;
    readonly language?: string;
  };

  /** LiveKit room configuration hints. */
  readonly livekitConfig?: {
    readonly roomPrefix?: string;
    readonly dataChannelName?: string;
  };

  /**
   * System prompt template for the AI examiner.
   * MAY contain template variables resolved at runtime:
   *   {{nodePromptSeed}}, {{candidateName}}, {{evidenceTargets}},
   *   {{completionPolicy}}, {{followUpPolicy}}, etc.
   */
  readonly systemPromptTemplate?: string;

  /**
   * Per-node prompt template overrides.
   * Key is nodeId, value is the prompt template for that node.
   */
  readonly nodePromptOverrides?: Readonly<Record<string, string>>;
}

/**
 * A runtime package annotated with session-specific state.
 * Used internally by the runtime controller — NOT part of the canonical IR.
 */
interface AnnotatedRuntimePackage {
  readonly package: ExamRuntimePackage;
  readonly state: RuntimeStateSchema;
  readonly ledger: EvidenceLedger;
  readonly transcript: readonly TranscriptTurn[];
  readonly events: readonly RuntimeEvent[];
}

/**
 * Result of evaluating a completion policy against current state.
 */
interface CompletionEvaluation {
  readonly isComplete: boolean;
  readonly reason: string;
  readonly conditionsMet: readonly string[];
  readonly conditionsUnmet: readonly string[];
}

/**
 * Result of evaluating a transition condition.
 */
interface TransitionEvaluation {
  readonly isEligible: boolean;
  readonly targetNodeId: string;
  readonly conditionType: string;
  readonly details?: string;
}

/**
 * A candidate command that has been detected and needs processing.
 */
interface PendingCandidateCommand {
  readonly command: CandidateCommandType;
  readonly turnIndex: number;
  readonly rawText: string;
  readonly confidence: number;
}

Assessment-theoretic profile grounded in Joughin’s (1998) six dimensions of oral assessment. Captures the design parameters that determine what the exam measures, how it is structured, and what validity/reliability claims it supports.

/**
 * Assessment-theoretic profile for the exam.
 * Encodes Joughin's (1998) six dimensions of oral assessment as design parameters.
 * When present, constrains runtime behavior and enables validity/reliability auditing.
 * When absent, defaults are inferred from node-level policies.
 *
 * @see Joughin, G. (1998). Dimensions of Oral Assessment. Assessment & Evaluation
 *   in Higher Education, 23(4), 367-378.
 */
interface AssessmentProfile {
  /**
   * What the exam primarily assesses (Joughin Dimension 1).
   * An exam may cover multiple content types.
   * Determines what counts as valid evidence and how signals should be interpreted.
   */
  readonly contentTypes: readonly ContentType[];

  /**
   * Where on the presentation–dialogue continuum (Joughin Dimension 2).
   * Affects reliability: dialogue tends toward lower reliability but higher validity.
   * @see Joughin (1998, p. 376): "reliability is threatened when interaction tends
   *   toward the dialogue pole."
   */
  readonly interactionMode: InteractionMode;

  /**
   * Target professional context and fidelity level (Joughin Dimension 3).
   * Relates to face validity and construct validity.
   * @see Akimov & Malin (2020): authenticity relates to face and construct validity.
   */
  readonly authenticityProfile?: AuthenticityProfile;

  /**
   * Degree of structural openness (Joughin Dimension 4).
   * Closed structure improves reliability; open structure improves validity for
   * probing understanding.
   * @see Joughin (1998, p. 376): "reliability is threatened when the 'structure'
   *   dimension tends towards the 'open' pole."
   */
  readonly structureProfile?: StructureProfile;

  /**
   * Examiner configuration (Joughin Dimension 5).
   * Supports AI solo, human solo, panel, and AI-with-moderator models.
   * @see Joughin (1998): self-assessment, peer assessment, authority-based.
   * @see Akimov & Malin (2020): moderation for intra-rater reliability.
   */
  readonly examinerConfig?: ExaminerConfiguration;

  /**
   * Role of the oral component (Joughin Dimension 6).
   * Purely oral vs. oral supplementing written work.
   * @see Joughin (1998): "the student's oral response may be combined with,
   *   or supplementary to, other forms of response such as a written paper."
   */
  readonly oralityProfile?: OralityProfile;

  /**
   * Validity and reliability claims the exam makes.
   * Optional structured declaration of how the exam addresses assessment quality.
   * @see Akimov & Malin (2020): validity/reliability/fairness matrix.
   */
  readonly validityClaims?: readonly ValidityClaim[];

  /**
   * Moderation policy for AI-generated evidence signals.
   * Enables human review of a sample of sessions for quality assurance.
   * @see Akimov & Malin (2020): "all online oral examinations were recorded
   *   and moderated by another finance academic."
   */
  readonly moderationPolicy?: ModerationPolicy;

  /**
   * Calibration profile for the AI examiner.
   * References calibration exercises and accuracy metrics.
   * @see Fenton (2025): "with larger cohorts, have a calibration process
   *   to reduce differences."
   */
  readonly calibrationProfile?: CalibrationProfile;
}

/** Joughin's four primary content categories (Dimension 1). */
type ContentType =
  /** Recall of facts, comprehension of meaning (Bloom's knowledge/understanding). */
  | "knowledge_understanding"
  /** "Think on one's feet," clinical reasoning, critical thinking. */
  | "applied_problem_solving"
  /** Communication skills exhibited in context, not abstract skills. */
  | "interpersonal_competence"
  /** Confidence, self-awareness, reactions to stress, personality. */
  | "intrapersonal_qualities";

/** Joughin's interaction continuum (Dimension 2). */
type InteractionMode =
  /** One-way: candidate presents, no follow-up (e.g., oral presentation). */
  | "presentation"
  /** Predetermined questions with limited, structured follow-up. */
  | "structured_dialogue"
  /** Open conversation with adaptive follow-up. */
  | "free_dialogue";

/** Joughin's authenticity continuum (Dimension 3). */
interface AuthenticityProfile {
  /** Target professional context (e.g., "clinical consultation", "job interview"). */
  readonly targetContext: string;
  /** Fidelity level: how closely the exam replicates professional practice. */
  readonly fidelityLevel: "abstract" | "simulated" | "authentic";
  /** Elements being simulated (e.g., ["patient history", "time pressure"]). */
  readonly simulationElements?: readonly string[];
}

/** Joughin's structure continuum (Dimension 4). */
interface StructureProfile {
  /**
   * Degree of openness: 0.0 = fully closed (set questions, fixed order, no deviation),
   * 1.0 = fully open (loosely structured agenda, examiner follows responses).
   */
  readonly opennessScore: number;
  /** Whether questions are known in advance by candidates. */
  readonly questionsDisclosed: boolean;
  /** Whether question order is fixed or adaptive. */
  readonly orderFixed: boolean;
}

/** Joughin's examiner dimension (Dimension 5). */
interface ExaminerConfiguration {
  /** Type of examiner administering the exam. */
  readonly examinerType: "ai_solo" | "human_solo" | "panel" | "ai_with_human_moderator";
  /** Number of examiners (for panels). */
  readonly panelSize?: number;
  /** Whether moderation of AI-generated signals is enabled. */
  readonly moderationEnabled: boolean;
  /** Sampling rate for moderation review (0.0 to 1.0). */
  readonly moderationSampleRate?: number;
}

/** Joughin's orality continuum (Dimension 6). */
interface OralityProfile {
  /** Whether the exam is purely oral or secondary to another component. */
  readonly mode: "purely_oral" | "oral_primary" | "oral_secondary";
  /** Supplementary materials the candidate must submit before the exam. */
  readonly requiredSubmissions?: readonly CandidateArtifact[];
}

/** A candidate-submitted artifact that the oral exam may defend or discuss. */
interface CandidateArtifact {
  readonly artifactId: string;
  readonly type: "written_paper" | "code" | "design" | "report" | "portfolio";
  readonly title: string;
  readonly submittedAt?: string; // ISO 8601
}

/** A validity/reliability claim the exam makes. */
interface ValidityClaim {
  readonly type: "face" | "content" | "construct" | "concurrent" | "inter_rater" | "inter_case" | "fairness" | "inter_item_consistency" | "intra_rater_reliability";
  readonly description: string;
  /** Supporting evidence or reference (e.g., validation study ID, survey results). */
  readonly supportingEvidence?: string;
}

/** Moderation policy for AI-generated evidence signals. */
interface ModerationPolicy {
  /** Whether moderation is enabled. */
  readonly enabled: boolean;
  /** Sampling strategy for selecting sessions for review. */
  readonly samplingStrategy: "random" | "stratified" | "all_fails" | "all";
  /** Sample rate (0.0 to 1.0). Only used for "random" strategy. */
  readonly sampleRate?: number;
  /** What the moderator reviews. */
  readonly reviewScope: readonly ("evidence_signals" | "examiner_behavior" | "fairness" | "transcript")[];
  /** What happens on disagreement between moderator and AI. */
  readonly disagreementAction: "flag_for_review" | "override_ai" | "escalate_to_panel";
}

/** Calibration profile for the AI examiner. */
interface CalibrationProfile {
  /** Whether calibration has been performed. */
  readonly calibrated: boolean;
  /** References to calibration exercise exam IDs. */
  readonly calibrationExamIds?: readonly string[];
  /** Measured accuracy against ground truth (0.0 to 1.0). */
  readonly accuracyAgainstGroundTruth?: number;
  /** Inter-rater reliability with human markers (Cohen's kappa). */
  readonly interRaterKappa?: number;
  /** ISO 8601 timestamp of last calibration. */
  readonly lastCalibratedAt?: string;

  /**
   * Human moderator training requirements.
   * @see Fenton (2025) Recommendation 2: "Have clear guidelines for both
   *   academic staff and students."
   * @see Fenton (2025) Recommendation 7: "Consider a training or shadowing
   *   program with experienced instructors."
   * @see Akimov & Malin (2020): "None of the 30 examiners surveyed had
   *   any training on how to conduct oral examinations."
   */
  readonly moderatorTraining?: {
    /** Whether moderator training is required before reviewing sessions. */
    readonly trainingRequired: boolean;
    /** References to training materials or documentation. */
    readonly trainingMaterials?: readonly ResourceReference[];
    /** Whether shadowing experienced moderators is required. */
    readonly shadowingRequired?: boolean;
    /** References to calibration exercises for moderators. */
    readonly calibrationExerciseIds?: readonly string[];
  };
}

Question pools for randomized question delivery. Enables inter-case reliability by drawing equivalent questions for different candidates.

/**
 * A pool of equivalent question variants for randomized delivery.
 * Addresses inter-case reliability: different candidates receive different
 * questions of equivalent difficulty.
 *
 * @see Akimov & Malin (2020): bank of 69 questions from which students draw randomly.
 * @see Bayley et al. (2024): "each instructor developed a unique set of four questions."
 */
interface QuestionPool {
  /** Unique pool identifier within the package. */
  readonly poolId: string;

  /** Human-readable label (e.g., "Photosynthesis questions — set A"). */
  readonly label: string;

  /**
   * Question variants in this pool. All variants are considered equivalent
   * for reliability purposes.
   */
  readonly variants: readonly QuestionVariant[];

  /** How many variants to draw per session. Default: 1. */
  readonly drawCount: number;

  /**
   * Whether the same variant can appear in concurrent sessions.
   * Set to false to mitigate question-sharing (Bayley et al., 2024, p. 165).
   */
  readonly allowReuseAcrossConcurrentSessions: boolean;
}

/** A single question variant within a pool. */
interface QuestionVariant {
  /** Unique variant identifier within the pool. */
  readonly variantId: string;

  /** The prompt seed for this variant. */
  readonly promptSeed: string;

  /**
   * Estimated difficulty for equivalence checking.
   * Range: 0.0 (easiest) to 1.0 (hardest). Optional — used for calibration.
   */
  readonly difficultyEstimate?: number;

  /** Evidence targets this variant assesses. */
  readonly evidenceTargetIds: readonly string[];
}

Record of human moderator review of AI-generated evidence.

/**
 * A record of human moderator review of an AI-examined session.
 * Supports inter-rater reliability tracking and evidence quality assurance.
 *
 * @see Akimov & Malin (2020): "all online oral examinations were recorded
 *   and moderated by another finance academic."
 */
interface ModerationRecord {
  readonly recordId: string;
  readonly sessionId: string;
  readonly moderatorId: string;
  /** The original AI-generated evidence signals. */
  readonly originalSignals: readonly EvidenceSignal[];
  /** The moderator's adjusted signals (if any). */
  readonly adjustedSignals?: readonly EvidenceSignal[];
  /** Whether the moderator agreed with the AI assessment. */
  readonly agreed: boolean;
  /** Notes from the moderator. */
  readonly notes?: string;
  readonly timestampMs: number;
}

Structured fairness audit results for an exam or cohort.

/**
 * A structured fairness audit for an exam or cohort.
 * Enables detection of systematic disparities across demographic dimensions.
 *
 * @see Akimov & Malin (2020): fairness "does discriminate against students
 *   with poorer command of English."
 * @see Fenton (2025): "careful preparation is recommended to avoid any bias."
 */
interface FairnessAudit {
  readonly auditId: string;
  readonly examId: string;
  readonly cohortId?: string;
  readonly conductedAt: string; // ISO 8601
  /** Demographic dimensions analyzed (e.g., ["language_background", "gender"]). */
  readonly dimensionsAnalyzed: readonly string[];
  /** Results per dimension. */
  readonly results: readonly FairnessDimensionResult[];
  /** Whether the audit found significant disparities. */
  readonly disparitiesFound: boolean;
  /** Recommended actions if disparities found. */
  readonly recommendations?: readonly string[];
}

/** Result for a single demographic dimension. */
interface FairnessDimensionResult {
  readonly dimension: string; // e.g., "language_background", "gender"
  readonly metric: string; // e.g., "mean_evidence_confidence", "mean_followup_count"
  readonly groupValues: Readonly<Record<string, number>>;
  readonly disparitySignificant: boolean;
  readonly effectSize?: number;
}

Session recording metadata for moderation and audit.

/**
 * Metadata for a session recording (audio/video).
 * Enables moderation review and appeal processes.
 *
 * @see Akimov & Malin (2020): "to reduce the potential problem of intra-rater
 *   reliability, all online oral examinations were recorded and moderated."
 */
interface SessionRecording {
  readonly sessionId: string;
  /** Audio recording reference. */
  readonly audioRef?: string;
  /** Video recording reference (if available). */
  readonly videoRef?: string;
  /** Whether the recording is available for moderation review. */
  readonly availableForModeration: boolean;
  /** Whether the candidate consented to recording. */
  readonly candidateConsented: boolean;
  /** Retention policy. */
  readonly retentionPolicy: {
    readonly retainUntilMs: number;
    readonly deleteAfterReview: boolean;
  };
}

Additional candidate commands grounded in the oral assessment literature.

/**
 * Extended candidate command type with assessment-significant additions.
 * The base CandidateCommandType (§13) covers operational commands.
 * These additions capture dialogic interaction patterns that the literature
 * identifies as assessment-significant.
 *
 * @see Joughin (1998): dialogue pole — candidates may redirect conversation.
 * @see Fenton (2025): "students can reflect on their choices and self-correct."
 */
type ExtendedCandidateCommandType =
  | CandidateCommandType
  /** Candidate pushes back on a premise or framing. Demonstrates critical thinking. */
  | "challenge_premise"
  /** Candidate wants to revisit and revise an earlier answer. */
  | "revise_earlier_answer";

Bloom’s Taxonomy cognitive levels for classifying evidence targets.

/**
 * Bloom's Taxonomy cognitive levels (Bloom, 1956).
 * Classifies what cognitive depth an evidence target assesses.
 * Used for validation, follow-up escalation strategy, and marking rubric alignment.
 *
 * @see Bloom, B.S. (1956). Taxonomy of Educational Objectives.
 * @see Fenton (2025): "Generative AI tools have been found to perform well
 *   at the lower levels of Bloom's taxonomy but struggle at the create level
 *   and making arguments built on theoretical frameworks."
 */
type BloomLevel =
  /** Recall facts and basic concepts. */
  | "remember"
  /** Explain ideas or concepts. */
  | "understand"
  /** Use information in new situations. */
  | "apply"
  /** Draw connections among ideas. */
  | "analyze"
  /** Justify a stand or a decision. */
  | "evaluate"
  /** Produce new or original work. */
  | "create";

Candidate-facing exam information for transparency and preparation.

/**
 * Candidate-facing briefing information about the exam.
 * Provides transparency about exam format, expectations, and available commands.
 * Addresses Joughin's (1998) concern that "students need to know in advance
 * what to expect of the shape of the assessment in order to prepare adequately."
 *
 * @see Joughin (1998) Dimension 4: Structure.
 * @see Fenton (2025) Recommendation 1: pre-exam information provision.
 * @see Akimov & Malin (2020): pre-exam survey findings on student preparation.
 */
interface CandidateBriefing {
  /** Human-readable description of the exam format. */
  readonly formatDescription: string;

  /** Estimated total duration in human-readable form (e.g., "20 minutes"). */
  readonly estimatedDuration?: string;

  /** Number of sections or question groups. */
  readonly sectionCount?: number;

  /** Whether a practice/mock session is available before the exam. */
  readonly practiceSessionAvailable?: boolean;

  /** Commands the candidate can use during the exam (e.g., repeat, clarification). */
  readonly availableCommands?: readonly string[];

  /** Whether the exam is open-book, closed-book, or restricted. */
  readonly bookPolicy?: "open" | "closed" | "restricted";

  /** Any materials the candidate should prepare in advance. */
  readonly requiredPreparation?: readonly string[];

  /** Criteria by which the candidate will be assessed. */
  readonly assessmentCriteria?: readonly string[];

  /** Whether the exam will be recorded. */
  readonly recordingDisclosure?: boolean;
}

Controls the AI examiner’s conversational tone and rapport-building.

/**
 * Policy controlling the AI examiner's conversational style.
 * Ensures the examiner creates a conversational, non-interrogatory atmosphere.
 * This is critical for AI-conducted assessment because the AI cannot rely
 * on implicit social skills.
 *
 * @see Fenton (2015): oral assessment as "conversation rather than interrogatory."
 * @see Fenton (2025): rapport-building and candidate comfort as assessment concerns.
 */
interface ConversationalStylePolicy {
  /** Overall tone of the examiner. */
  readonly tone?: "formal" | "semi_formal" | "warm" | "neutral";

  /** Level of warmth and rapport-building. */
  readonly warmth?: "low" | "medium" | "high";

  /** Whether the examiner should use the candidate's name. */
  readonly useCandidateName?: boolean;

  /** Whether the examiner should acknowledge good responses before probing further. */
  readonly acknowledgeGoodResponses?: boolean;

  /** Whether the examiner should apologize for necessary clarifications. */
  readonly apologizeForClarifications?: boolean;

  /** Maximum consecutive rapid-fire questions before a conversational pause. */
  readonly maxConsecutiveQuestions?: number;

  /** Whether the examiner should use conversational fillers (e.g., "I see", "Interesting"). */
  readonly useConversationalFillers?: boolean;
}

Controls what feedback the examiner can give during formative assessments.

/**
 * Policy controlling examiner feedback in formative assessment mode.
 * Distinguishes between evidence-relevant signals (written to the ledger)
 * and learning-oriented feedback (delivered to the candidate but not recorded as evidence).
 *
 * In formative mode, the examiner can provide real-time feedback to enhance learning.
 * In summative mode, feedback is suppressed to avoid biasing evidence.
 *
 * @see Fenton (2025): oral assessment as "enhancer of student learning."
 * @see Fenton (2025): formative vs summative distinction.
 */
interface FormativeFeedbackPolicy {
  /** Whether formative feedback is enabled. */
  readonly enabled: boolean;

  /** When feedback is delivered relative to the candidate's response. */
  readonly feedbackTiming?: "immediate" | "after_node" | "after_exam";

  /** Types of feedback the examiner may provide. */
  readonly allowedFeedbackTypes?: readonly (
    /** Acknowledge correct/strong responses. */
    | "positive_acknowledgment"
    /** Gently indicate areas for improvement. */
    | "constructive_nudge"
    /** Provide a hint or scaffold toward the correct answer. */
    | "scaffolding_hint"
    /** Summarize what the candidate has demonstrated so far. */
    | "progress_summary"
  )[];

  /** Whether feedback is recorded in the evidence ledger. */
  readonly recordFeedback?: boolean;

  /** Whether feedback can reference specific rubric criteria. */
  readonly allowRubricReference?: boolean;
}

A reference to an external resource (training material, document, URL).

/** A reference to an external resource. */
interface ResourceReference {
  /** Human-readable label for the resource. */
  readonly label: string;
  /** URL or path to the resource. */
  readonly url?: string;
  /** Type of resource. */
  readonly type?: "document" | "video" | "exercise" | "rubric" | "guideline";
}
VersionDateChanges
v0.2.02026-06-30Added BloomLevel, cognitiveLevel, integrated_practice, CandidateBriefing, ConversationalStylePolicy, FormativeFeedbackPolicy, promptingPrinciples, cognitiveEscalationStrategy, bookPolicy, inter_item_consistency, intra_rater_reliability, moderatorTraining, isPractice, anxietyMitigation, ResourceReference. Updated terminology from ‘IR’ to ‘specification’.
v0.1.02026-05-06Initial release. 26 sections covering all core objects.