Schema (TypeScript)

Status

Draft · v0.2.0 · 2026-06-30

Schema Authority

This file is the canonical, authoritative type definition source for the IOA-ORM. When any other spec file (00-overview.md through 13-open-questions.md) defines a type, enum, or field name that conflicts with this file, this file takes precedence. Other files MAY reference these types but MUST NOT redefine them with different field names, types, or enum values.

Conventions

All interfaces use readonly for fields that MUST NOT change after creation.
string IDs are UUIDv4 unless noted otherwise.
Timestamps are number (Unix epoch milliseconds).
Durations are number (milliseconds).
Optional fields are marked with ? and documented when they apply.
Discriminated unions use kind as the discriminant.
Normative language (MUST/SHOULD/MAY) appears in JSDoc comments.

1. ExamRuntimePackage

The top-level canonical artifact. A published, versioned, complete specification of an oral exam.

/**
 * The canonical, versioned, executable specification of a published oral assessment.
 * This is the single source of truth consumed by runtime controller, Pipecat adapter,
 * and marking runtime.
 */
interface ExamRuntimePackage {
  /** Unique identifier for this exam. Stable across versions. */
  readonly examId: string;

  /** Semantic version of this package (e.g., "1.2.0"). MUST increment on any change. */
  readonly version: string;

  /** ISO 8601 timestamp of when this version was published. */
  readonly publishedAt: string;

  /** Human-readable metadata. */
  readonly metadata: ExamMetadata;

  /**
   * Assessment-theoretic profile grounded in Joughin's (1998) six dimensions
   * of oral assessment. Captures design parameters that determine what the exam
   * measures, how it is delivered, and what validity/reliability claims it makes.
   * OPTIONAL in v1 — when absent, defaults are inferred from node policies.
   */
  readonly assessmentProfile?: AssessmentProfile;

  /** Ordered list of runtime nodes forming the exam graph. */
  readonly nodes: readonly ExamRuntimeNode[];

  /** Global policies that apply across all nodes unless overridden. */
  readonly globalPolicies: GlobalRuntimePolicies;

  /** Registry of all evidence targets defined for this exam. */
  readonly evidenceTargets: readonly EvidenceTarget[];

  /** Question pools for randomized question delivery. Referenced by nodes. */
  readonly questionPools?: readonly QuestionPool[];

  /** Configuration hints for the Pipecat adapter. */
  readonly pipecatAdapter?: PipecatAdapterConfig;

  /**
   * Candidate-facing briefing information about the exam.
   * Describes format, duration, available commands, and preparation guidance.
   * @see Joughin (1998) Dimension 4: "Students need to know in advance what
   *   to expect of the shape of the assessment in order to prepare adequately."
   * @see Fenton (2025) Recommendation 1: "Students should be given information
   *   about the schedule and assessment criteria beforehand."
   */
  readonly candidateBriefing?: CandidateBriefing;
}

2. ExamMetadata

Human-readable information about the exam. Used for display, search, and audit.

/** Metadata describing the exam for human consumers. */
interface ExamMetadata {
  /** Exam title (e.g., "Biology 201 Oral Practical"). */
  readonly title: string;

  /** Subject or course code. */
  readonly subject: string;

  /** Institution or department. */
  readonly institution?: string;

  /** Academic term or semester. */
  readonly term?: string;

  /** Target language of the exam. */
  readonly language: string;

  /** Estimated total duration in milliseconds. */
  readonly estimatedDurationMs: number;

  /** Maximum allowed duration in milliseconds. Hard cap. */
  readonly maxDurationMs: number;

  /** Author(s) who designed this exam in the studio. */
  readonly authors?: readonly string[];

  /** Human-readable description of the exam. */
  readonly description?: string;

  /** Tags for search and categorization. */
  readonly tags?: readonly string[];

  /**
   * Formative, summative, or diagnostic purpose.
   * Affects whether evidence contributes to grades, whether candidate receives
   * real-time feedback, and whether the exam is recorded for review.
   * @see Fenton (2025) on formative vs summative oral assessments.
   */
  readonly assessmentPurpose?: "formative" | "summative" | "diagnostic";

  /**
   * Expected number of candidates for this exam session.
   * Used for scalability planning (question pool sizing, parallel grading hints).
   * @see Bayley et al. (2024) on scaling ConVOEs to 600+ students.
   */
  readonly expectedCandidateCount?: number;

  /**
   * Whether the exam is open-book, closed-book, or restricted.
   * Affects cognitive demands (less memorization in open-book) and anxiety levels.
   * @see Fenton (2025) Recommendation 13: "Plan if the assessment will be
   *   open book or closed book."
   * @see Sayre (2014) on open-book assessment design.
   */
  readonly bookPolicy?: "open" | "closed" | "restricted";
}

3. ExamRuntimeNodeKind

Discriminant for node types. Determines default behaviors and valid policy combinations.

/**
 * The type of a runtime node.
 * Determines default behaviors, valid policies, and how the runtime controller
 * manages the node lifecycle.
 */
type ExamRuntimeNodeKind =
  /** A direct question posed to the candidate. */
  | "question"
  /** A scenario presentation (read aloud, display material, etc.). */
  | "scenario"
  /** A structured task (role-play, problem-solving, demonstration). */
  | "task"
  /** An open-ended discussion segment. */
  | "discussion"
  /** Pre-assessment rapport building. NOT assessed. */
  | "warmup"
  /** Closing segment. May include summary or feedback. */
  | "wrapup"
  /** Conditional routing node. No candidate interaction. */
  | "branch"
  /**
    * Pre-exam identity verification node. Candidate presents ID.
    * NOT assessed. Emits identity_verified or identity_failed events.
    * @see Akimov & Malin (2020): "each student had to show the examiner a current
    *   student ID card or a government-issued document."
    * @see Fenton (2025): "ensure the student presents their identification card."
    */
  | "identity_check";

4. ExamRuntimeNode

A single unit in the exam graph. Contains local policies that override globals.

/**
 * A discrete unit of the exam flow.
 * Each node has a type, local policies, evidence targets, and transition rules.
 * Local policies override global policies for this node only.
 */
interface ExamRuntimeNode {
  /** Unique identifier within the package. */
  readonly nodeId: string;

  /** The type of this node. */
  readonly kind: ExamRuntimeNodeKind;

  /**
   * Base content or prompt seed for this node.
   * This is NOT the full system prompt — it is the content the Pipecat adapter
   * and runtime controller use to construct the examiner's behavior.
   * MAY contain template variables (e.g., {{candidateName}}).
   */
  readonly promptSeed: string;

  /** Display order in the exam flow. Used for linear progression. */
  readonly order: number;

  /** Human-readable label for this node (e.g., "Q1: Photosynthesis"). */
  readonly label?: string;

  /** Maximum time allowed in this node. Overrides global per-node default. */
  readonly timeBudgetMs?: number;

  /** Local completion policy. Overrides global default. */
  readonly completionPolicy?: CompletionPolicy;

  /** Local follow-up policy. Overrides global default. */
  readonly followUpPolicy?: FollowUpPolicy;

  /** Local recovery policy. Overrides global default. */
  readonly recoveryPolicy?: RecoveryPolicy;

  /**
   * If this node draws from a question pool, the pool reference.
   * When set, `promptSeed` serves as a template with {{variantPromptSeed}} placeholder.
   * @see Akimov & Malin (2020) on question banking for inter-case reliability.
   */
  readonly questionPoolId?: string;

  /** Evidence targets assessed at this node. References into package-level targets. */
  readonly evidenceTargetIds?: readonly string[];

  /** Transition rules from this node to successor nodes. */
  readonly transitions: readonly TransitionPolicy[];

  /** Which candidate commands are valid at this node. */
  readonly candidateCommands?: CandidateCommandPolicy;

  /** Whether this node produces assessed evidence. warmup/wrapup are typically false. */
  readonly isAssessed: boolean;

  /** Optional context overrides for the AI examiner at this node. */
  readonly contextOverride?: Partial<ContextPolicy>;

  /**
   * Whether this warmup node is a practice session (not assessed).
   * Only applicable when kind = "warmup".
   * Practice sessions help reduce anxiety by familiarizing candidates with the format.
   * @see Fenton (2025): "the anxiety some students experience may be linked to
   *   the fact that they are unfamiliar with the format."
   * @see Akimov & Malin (2020): 100% of students were nervous; practice helped.
   */
  readonly isPractice?: boolean;

  /**
   * Anxiety mitigation strategy for this warmup node.
   * Only applicable when kind = "warmup".
   * @see Fenton (2025) Recommendation 8: anxiety management.
   * @see Akimov & Malin (2020): anxiety as a major concern in oral assessment.
   */
  readonly anxietyMitigation?:
    /** Gradual exposure: start with easy questions, build up to assessed content. */
    | "graduated_exposure"
    /** Breathing exercise: guide the candidate through a calming exercise. */
    | "breathing_exercise"
    /** Format familiarization: explain the exam structure and available commands. */
    | "format_familiarization"
    /** Combined: all of the above in sequence. */
    | "combined";
}

5. RuntimeStateSchema

Mutable per-session state tracked by the runtime controller. NOT persisted as a log — this is working memory.

/**
 * The mutable state of a runtime session.
 * Maintained by the runtime controller. Updated on every turn, command, and transition.
 * This is working memory — the authoritative persistent outputs are the
 * EvidenceLedger and RuntimeEvent log.
 */
interface RuntimeStateSchema {
  /** Current status of the session. */
  readonly status: SessionStatus;

  /** ID of the node the session is currently in. */
  readonly currentNodeId: string;

  /** Number of candidate turns in the current node. */
  readonly currentNodeTurnCount: number;

  /** Number of follow-ups issued by the examiner in the current node. */
  readonly currentNodeFollowUpCount: number;

  /** Total elapsed time for the session in milliseconds. */
  readonly globalElapsedMs: number;

  /** Elapsed time in the current node in milliseconds. */
  readonly nodeElapsedMs: number;

  /** History of candidate commands issued in this session. */
  readonly candidateCommandHistory: readonly CandidateCommandRecord[];

  /** Map of evidence target ID to number of signals received. */
  readonly evidenceCoverage: Readonly<Record<string, number>>;

  /** Recovery attempts in the current node. */
  readonly currentNodeRecoveryAttempts: readonly RecoveryAttemptRecord[];

  /** Index of the last processed turn. */
  readonly lastTurnIndex: number;

  /** Timestamp of the last state update. */
  readonly lastUpdatedAt: number;
}

type SessionStatus =
  | "active"
  | "paused"
  | "completed"
  | "terminated";

interface CandidateCommandRecord {
  readonly command: CandidateCommandType;
  readonly turnIndex: number;
  readonly timestampMs: number;
  readonly handled: boolean;
  readonly response?: string;
}

interface RecoveryAttemptRecord {
  readonly scenario: RecoveryScenario;
  readonly attemptNumber: number;
  readonly timestampMs: number;
  readonly action: RecoveryEscalation;
  readonly successful: boolean;
}

6. EvidenceTarget

A rubric-aligned definition of what the exam is trying to assess.

/**
 * Defines what the exam is trying to assess at a conceptual level.
 * Linked to rubric criteria in the marking model.
 * Referenced by nodes and by evidence signals.
 */
interface EvidenceTarget {
  /** Unique identifier within the package. */
  readonly targetId: string;

  /** Human-readable label (e.g., "Explain photosynthesis mechanism"). */
  readonly label: string;

  /** Detailed description of what constitutes valid evidence. */
  readonly description: string;

  /** Links to rubric criteria IDs in the marking model. */
  readonly rubricCriteriaIds: readonly string[];

  /**
   * The dimension of oral assessment this target addresses.
   * Joughin (1998) identifies four primary content types. The "metacognitive"
   * dimension captures self-correction and reasoning process quality
   * (Fenton, 2025).
   */
  readonly evidenceDimension:
    | "knowledge_understanding"
    | "applied_problem_solving"
    | "interpersonal_competence"
    | "intrapersonal_quality"
    | "metacognitive"
    | "integrated_practice";

  /**
   * Bloom's Taxonomy cognitive level this target assesses.
   * Enables cognitive-level-aware validation, follow-up escalation, and marking.
   * Optional — when omitted, the target is not classified by cognitive level.
   * @see Bloom, B.S. (1956). Taxonomy of Educational Objectives.
   * @see Fenton (2025): "Generative AI tools have been found to perform well
   *   at the lower levels of Bloom's taxonomy but struggle at the create level."
   */
  readonly cognitiveLevel?: BloomLevel;

  /**
   * Whether this target is transversal (session-wide) or scoped to specific nodes.
   * Transversal targets (e.g., communication quality, critical thinking) are assessed
   * across ALL nodes, not scoped to specific ones.
   * Joughin (1998): interpersonal competence is "not skills per se but rather skills
   * exhibited in relation to a clinical situation or problem solving exercise."
   */
  readonly transversal: boolean;

  /** The node(s) where this target is expected to be evidenced. Empty for transversal targets. */
  readonly expectedNodeIds: readonly string[];

  /**
   * Aggregation method for transversal targets.
   * - "holistic": marker judges overall quality from the full session
   * - "best_of": highest signal quality across nodes
   * - "trajectory": assess whether quality improved over the session
   * Ignored for non-transversal targets.
   */
  readonly aggregationMethod?: "holistic" | "best_of" | "trajectory";

  /**
   * Minimum confidence threshold for a signal to count as "satisfied."
   * Range: 0.0 to 1.0. Default: 0.7.
   */
  readonly requiredConfidence: number;

  /**
   * Maximum number of signals this target can receive.
   * Prevents over-counting from repeated follow-ups.
   * If omitted, unlimited.
   */
  readonly maxSignals?: number;

  /** Minimum number of positive signals needed to consider this target "covered". */
  readonly minPositiveSignals: number;

  /** Whether this target MUST be satisfied for the exam to be considered complete. */
  readonly isRequired: boolean;

  /** Weight of this target in the overall assessment. Range: 0.0 to 1.0. */
  readonly weight: number;
}

7. EvidenceSignal

A runtime-emitted record that an evidence target was demonstrated.

/**
 * A record that a specific evidence target was (or was not) demonstrated.
 * Produced by the AI examiner during conversation. Written to the ledger immediately.
 * This is NOT derived from transcript — it is a first-class runtime artifact.
 */
interface EvidenceSignal {
  /** Unique identifier for this signal. */
  readonly signalId: string;

  /** The session this signal belongs to. */
  readonly sessionId: string;

  /** Which node the evidence was gathered in. */
  readonly nodeId: string;

  /** Transcript turns that support this signal. */
  readonly turnIds: readonly string[];

  /** Which evidence target(s) this signal addresses. */
  readonly targetIds: readonly string[];

  /**
   * The dimension of oral assessment this signal addresses.
   * Joughin (1998) identifies four primary content types. The "metacognitive"
   * dimension captures self-correction and reasoning process quality
   * (Fenton, 2025).
   */
  readonly evidenceDimension:
    | "knowledge_understanding"
    | "applied_problem_solving"
    | "interpersonal_competence"
    | "intrapersonal_quality"
    | "metacognitive";

  /**
   * Classification of the evidence.
   *
   * The taxonomy extends beyond knowledge-correctness to capture process quality.
   * Fenton (2025): oral assessments reveal "the process of learning rather than
   * the output" and allow students to "reflect on their choices and have the
   * chance to self-correct."
   *
   * - positive:           Correct and complete evidence
   * - partial:            Partially correct or incomplete
   * - absent:             No evidence for this target
   * - misconception:      Demonstrates a misunderstanding
   * - flawed_reasoning:   Right answer with incorrect justification
   * - process_positive:   Good reasoning process, regardless of final answer
   * - process_negative:   Poor reasoning process
   * - self_correction:    Candidate identified and corrected their own error
   */
  readonly signalKind:
    | "positive"
    | "partial"
    | "absent"
    | "misconception"
    | "flawed_reasoning"
    | "process_positive"
    | "process_negative"
    | "self_correction";

  /** Free-text description for human reviewers. */
  readonly description: string;

  /**
   * Confidence that the target was demonstrated.
   * Range: 0.0 (no evidence) to 1.0 (certain).
   */
  readonly confidence: number;

  /**
   * STT confidence summary for the underlying transcript turns.
   * Signal confidence is epistemically dependent on transcript quality.
   */
  readonly sttConfidenceSummary: {
    readonly min: number;
    readonly max: number;
    readonly mean: number;
    readonly turnCount: number;
  };

  /** Who proposed this signal. */
  readonly proposedBy: "llm_analysis" | "runtime_heuristic" | "manual_marker";

  /** Whether the runtime controller has validated this signal. */
  readonly approved: boolean;

  /** ISO-8601 timestamp of signal creation. */
  readonly createdAt: string;

  /** ISO-8601 timestamp of approval (null if not yet approved). */
  readonly approvedAt: string | null;

  /** Timestamp when the signal was emitted (Unix epoch ms). */
  readonly timestampMs: number;

  /** Schema version. */
  readonly schemaVersion: "1";
}

8. EvidenceLedger

The authoritative collection of evidence signals for a session.

/**
 * The structured, authoritative collection of all evidence signals for a session.
 * First-class output consumed by the marking runtime.
 * NOT a transcript derivative — maintained in real-time by the runtime controller.
 */
interface EvidenceLedger {
  /** The session this ledger belongs to. */
  readonly sessionId: string;

  /** The exam ID. */
  readonly examId: string;

  /** All evidence targets defined for this exam. */
  readonly targets: readonly EvidenceTarget[];

  /** All transcript turns, in chronological order. */
  readonly turns: readonly TranscriptTurn[];

  /** All evidence signals (approved and pending). */
  readonly signals: readonly EvidenceSignal[];

  /** All detected evidence gaps. */
  readonly gaps: readonly EvidenceGap[];

  /** Summary statistics. */
  readonly summary: {
    readonly totalTurns: number;
    readonly totalSignals: number;
    readonly signalsByKind: Readonly<Record<string, number>>;
    readonly signalsByDimension: Readonly<Record<string, number>>;
    readonly targetsFullyCovered: number;
    readonly targetsPartiallyCovered: number;
    readonly targetsWithGaps: number;
    readonly mandatoryGaps: number;
    readonly averageConfidence: number;
    readonly averageSttConfidence: number;
  };

  /**
   * Optional reference to the session recording.
   * Akimov & Malin (2020): recording enables post-hoc human review.
   */
  readonly recordingRef?: {
    readonly audioUrl?: string;
    readonly videoUrl?: string;
    readonly availableForModeration: boolean;
    readonly candidateConsented: boolean;
    readonly retentionPolicy: {
      readonly retainUntil: string;
      readonly deleteAfterReview: boolean;
    };
  };

  /**
   * Optional moderation record.
   * Akimov & Malin (2020): all oral exams were "moderated by another
   * finance academic" for intra-rater reliability.
   */
  readonly moderationRecord?: {
    readonly moderatorId: string;
    readonly reviewedAt: string;
    readonly agreementRate: number;
    readonly overriddenSignalIds: readonly string[];
    readonly addedSignals: readonly EvidenceSignal[];
    readonly notes?: string;
  };

  /** ISO-8601 timestamp of ledger finalisation. */
  readonly finalisedAt: string;

  /** Schema version. */
  readonly schemaVersion: "1";
}

9. CompletionPolicy

Rules for when a node is considered “done.”

/**
 * Rules governing when a node is considered complete.
 * The runtime controller evaluates this after every turn.
 * All specified conditions MUST be met for completion (AND logic),
 * unless `anyConditionSufficient` is true.
 */
interface CompletionPolicy {
  /** Minimum candidate turns before completion is possible. Default: 1. */
  readonly minTurns?: number;

  /** Hard cap on total turns. Forces completion when reached. */
  readonly maxTurns?: number;

  /** Specific evidence target IDs that MUST have satisfied signals. */
  readonly requiredEvidenceTargetIds?: readonly string[];

  /** Minimum number of evidence targets that must be satisfied. */
  readonly requiredEvidenceCount?: number;

  /** Maximum time in this node in milliseconds. Forces completion on expiry. */
  readonly timeBudgetMs?: number;

  /**
   * Whether the examiner can explicitly signal completion.
   * If false, only automatic conditions can complete the node.
   */
  readonly allowExplicitComplete?: boolean;

  /**
   * If true, any single condition being met is sufficient for completion.
   * If false (default), ALL conditions must be met.
   */
  readonly anyConditionSufficient?: boolean;

  /**
   * What happens when time budget expires.
   * "force_transition" — immediately move to next node.
   * "warn_and_extend" — warn candidate, allow one extension.
   * "terminate" — end the session.
   */
  readonly timeoutBehavior?: "force_transition" | "warn_and_extend" | "terminate";
}

10. FollowUpPolicy

Rules for examiner follow-up behavior within a node.

/**
 * Rules governing the AI examiner's follow-up behavior within a node.
 * The runtime controller enforces these limits. The AI examiner generates
 * follow-ups freely within the boundaries.
 */
interface FollowUpPolicy {
  /**
   * Hard cap on follow-ups per node. MUST NOT be exceeded.
   * When reached, the runtime controller forces escalation per `escalationRule`.
   */
  readonly maxFollowUps: number;

  /** Style of follow-up the examiner should use. */
  readonly followUpStyle?: "probing" | "scaffolding" | "clarifying" | "redirecting" | "free";

  /** Minimum time between follow-ups in milliseconds. */
  readonly minIntervalMs?: number;

  /**
   * If true, follow-ups are only issued when an evidence target in this node
   * is unsatisfied. Prevents unnecessary probing.
   */
  readonly requireEvidenceGap?: boolean;

  /**
   * Patterns the examiner MUST NOT use in follow-ups.
   * Checked post-generation. If matched, the follow-up is discarded and
   * the examiner regenerates (up to a retry limit).
   */
  readonly forbiddenFollowUpPatterns?: readonly string[];

  /** What to do when maxFollowUps is reached. */
  readonly escalationRule?: FollowUpEscalation;

  /**
   * Allowed prompting levels for this node.
   * Based on Pearce & Chiavaroli (2020) prompting taxonomy, cited in Fenton (2025).
   * Constrains the examiner's follow-up moves to maintain assessment fairness.
   * Defaults to ["present_task", "probing"] if omitted.
   */
  readonly allowedPromptingLevels?: readonly PromptingLevel[];

  /**
   * Whether prompting must be consistent across candidates taking this exam.
   * When true, the runtime controller MUST track and enforce that each candidate
   * receives the same prompting style (not necessarily the same wording).
   * Default: false.
   * @see Fenton (2025), citing Pearce & Chiavaroli (2020): consistency principle.
   */
  readonly requireConsistentPrompting?: boolean;

  /**
   * Whether the candidate should be informed about prompting style in advance.
   * Supports transparency — candidates know what to expect.
   * @see Fenton (2025), citing Pearce & Chiavaroli (2020): transparency principle.
   */
  readonly disclosePromptingStyle?: boolean;

  /**
   * Maximum scaffolding intensity for this node (0–3).
   * 0 = no scaffolding (independent answer). 3 = heavy scaffolding.
   * The amount of scaffolding provided is itself evidence of candidate competence.
   * @see Fenton (2025): "educators have the flexibility to simplify questions
   *   or prompt students who are struggling."
   * @see Vygotsky's Zone of Proximal Development (ZPD) theory.
   */
  readonly scaffoldingBudget?: number;

  /**
   * Guiding principles for how prompting is applied.
   * Based on Pearce & Chiavaroli (2020), cited in Fenton (2025).
   * These principles constrain HOW prompting is used, not just WHICH levels are allowed.
   * @see Pearce & Chiavaroli (2020); Fenton (2025) p. 434.
   */
  readonly promptingPrinciples?: {
    /** Prompts must neither discourage nor reassure the candidate. */
    readonly neutrality?: boolean;
    /** Prompts must be consistent across candidates. */
    readonly consistency?: boolean;
    /** Candidate should be informed about prompting style in advance. */
    readonly transparency?: boolean;
    /** Examiner must reflect on and adjust prompting practice. */
    readonly reflexivity?: boolean;
  };

  /**
   * Strategy for escalating cognitive depth through follow-ups.
   * Controls whether follow-ups aim to elicit higher-order thinking.
   * @see Bloom (1956); Fenton (2025) on higher-order thinking in oral assessment.
   */
  readonly cognitiveEscalationStrategy?:
    /** Stay at the same cognitive level as the initial response. */
    | "maintain"
    /** Probe for higher-order thinking (e.g., Remember → Understand → Apply). */
    | "escalate"
    /** Provide scaffolding to help the candidate reach the target cognitive level. */
    | "scaffold";
}

/**
 * Prompting taxonomy based on Pearce & Chiavaroli (2020), cited in Fenton (2025).
 * Ranges from neutral presentation to leading guidance.
 * Guiding principles: neutrality, consistency, transparency, reflexivity.
 */
type PromptingLevel =
  /** Simply state the question/task. Neutral. */
  | "present_task"
  /** Repeat the question for the candidate. */
  | "repeat_info"
  /** Ask if the candidate understands the question. */
  | "clarifying"
  /** Ask for deeper explanation or elaboration. */
  | "probing"
  /** Guide toward the correct answer (use sparingly, with caution). */
  | "leading";

type FollowUpEscalation =
  /** Move to the next node. */
  | "transition"
  /** Generate a wrap-up utterance and move on. */
  | "wrap_up"
  /** Terminate the session. */
  | "terminate"
  /** Log a warning and continue (allows the AI to attempt closure). */
  | "warn";

11. TransitionPolicy

Rules for moving between nodes.

/**
 * A single transition rule from one node to another.
 * Evaluated by the runtime controller after each turn and on policy triggers.
 * When multiple transitions are eligible, the highest priority wins.
 */
interface TransitionPolicy {
  /** The destination node ID. */
  readonly targetNodeId: string;

  /** Condition that must be true for this transition to fire. */
  readonly condition: TransitionCondition;

  /**
   * Priority for tie-breaking. Higher numbers win.
   * Default: 0.
   */
  readonly priority?: number;

  /**
   * If true, this transition overrides completion policy.
   * Used for timeout-forced transitions and error recovery.
   */
  readonly isForced?: boolean;

  /**
   * Optional prompt seed for the examiner to generate a natural
   * transition utterance (bridge). If omitted, the runtime
   * controller may inject a generic transition.
   */
  readonly bridgePrompt?: string;
}

type TransitionCondition =
  /** Always eligible. Typically used as a fallback. */
  | { readonly type: "always" }
  /** Specific evidence targets must be satisfied. */
  | { readonly type: "evidence_satisfied"; readonly targetIds: readonly string[] }
  /** Minimum candidate turn count must be reached. */
  | { readonly type: "turn_count_reached"; readonly minTurns: number }
  /** Time threshold must be crossed. */
  | { readonly type: "time_elapsed"; readonly minMs: number }
  /** Candidate must have issued a specific command. */
  | { readonly type: "candidate_command"; readonly command: CandidateCommandType }
  /** A policy limit must have been reached. */
  | { readonly type: "policy_escalation"; readonly policy: "follow_up_limit" | "time_budget" | "recovery_limit" };

12. RecoveryPolicy

Rules for handling anomalies during the session.

/**
 * A single recovery rule for a specific anomaly scenario.
 * The runtime controller matches anomalies to recovery rules and executes
 * the prescribed sequence. The AI examiner generates the recovery utterance.
 */
interface RecoveryPolicy {
  /** Which anomaly this rule addresses. */
  readonly scenario: RecoveryScenario;

  /** Maximum recovery attempts before escalation. */
  readonly maxAttempts: number;

  /** What to do when max attempts are exhausted. */
  readonly escalation: RecoveryEscalation;

  /** Prompt seed for the examiner's recovery utterance. */
  readonly recoveryPrompt?: string;

  /** Minimum wait before next recovery attempt in milliseconds. */
  readonly cooldownMs?: number;

  /** Time to wait before triggering recovery (e.g., silence detection). */
  readonly detectionThresholdMs?: number;
}

type RecoveryScenario =
  | "silence"
  | "unclear_answer"
  | "off_topic"
  | "anxiety"
  | "interruption"
  | "network_issue"
  | "repetition_loop";

type RecoveryEscalation =
  /** Retry with the same approach. */
  | "retry"
  /** Rephrase the question. */
  | "rephrase"
  /** Skip this node and move to the next. */
  | "skip_node"
  /** Pause the session (candidate can resume later). */
  | "pause_session"
  /** Terminate the session. */
  | "terminate";

13. CandidateCommandPolicy

Rules for which candidate commands are valid at a node.

/**
 * Defines which candidate commands are recognized and how they are handled
 * at a specific node. Commands not in this list are ignored (treated as
 * regular candidate speech).
 */
interface CandidateCommandPolicy {
  /** Commands explicitly allowed at this node. */
  readonly allowed: readonly AllowedAction[];

  /** Commands explicitly forbidden at this node. */
  readonly forbidden?: readonly ForbiddenAction[];
}

/**
 * A candidate command that is recognized and handled at this node.
 */
interface AllowedAction {
  /** The command type. */
  readonly command: CandidateCommandType;

  /**
   * Maximum times this command can be used in this node.
   * If omitted, unlimited.
   */
  readonly maxUses?: number;

  /**
   * How the runtime controller handles this command.
   * "inject_response" — the controller generates/replays a response.
   * "notify_examiner" — the examiner is informed and can adapt.
   * "pause" — the session is paused.
   * "skip" — the node is skipped.
   */
  readonly handling: "inject_response" | "notify_examiner" | "pause" | "skip";

  /**
   * For "inject_response" handling: the response template.
   * MAY contain {{turnText}} to reference the last examiner turn.
   */
  readonly responseTemplate?: string;
}

/**
 * A command that is explicitly forbidden at this node.
 * If detected, the runtime controller emits a policy_violation event.
 */
interface ForbiddenAction {
  /** The forbidden command type. */
  readonly command: CandidateCommandType;

  /** Reason for forbidding (for audit log). */
  readonly reason: string;

  /** What happens if the candidate attempts this command. */
  readonly onViolation: "ignore" | "inform" | "warn";
}

type CandidateCommandType =
  | "repeat"
  | "clarification"
  /** Candidate asks examiner to rephrase (distinct from repeat — signals active engagement). */
  | "request_rephrase"
  | "pause"
  | "raise_hand"
  | "skip"
  | "volume_up"
  | "volume_down"
  | "language_switch"
  /** Candidate signals they need a moment to think. Assessment-significant. */
  | "thinking_aloud";

14. RuntimeEvent

An immutable record of a significant state change.

/**
 * An immutable record of a significant state change during a session.
 * Events are the audit trail. Every event has a type, timestamp, and payload.
 * Events MUST be emitted by the runtime controller — they are not optional telemetry.
 */
interface RuntimeEvent {
  /** Unique event ID. */
  readonly eventId: string;

  /** The session this event belongs to. */
  readonly sessionId: string;

  /** Event type discriminator. */
  readonly type: RuntimeEventType;

  /** Unix epoch milliseconds when the event occurred. */
  readonly timestampMs: number;

  /** The node active when this event occurred, if applicable. */
  readonly nodeId?: string;

  /** The turn index associated with this event, if applicable. */
  readonly turnIndex?: number;

  /** Event-specific payload. */
  readonly payload: RuntimeEventPayload;
}

type RuntimeEventType =
  // Lifecycle
  | "session_started"
  | "session_paused"
  | "session_resumed"
  | "session_completed"
  | "session_terminated"
  // Node
  | "node_entered"
  | "node_exited"
  | "node_timeout"
  // Turn
  | "examiner_turn"
  | "candidate_turn"
  | "turn_completed"
  // Evidence
  | "evidence_signal_emitted"
  | "evidence_target_satisfied"
  | "evidence_target_missed"
  // Command
  | "candidate_command_received"
  | "candidate_command_processed"
  // Policy
  | "follow_up_limit_reached"
  | "time_budget_warning"
  | "time_budget_exceeded"
  | "transition_forced"
  | "recovery_triggered"
  | "recovery_succeeded"
  | "recovery_failed"
  | "policy_violation"
  // Agent
  | "agent_action_allowed"
  | "agent_action_blocked"
  // Assessment-significant moments (Fenton 2025)
  | "hesitation_detected"
  | "self_correction_detected";

/**
 * Event-specific payloads. Discriminated by RuntimeEventType.
 */
type RuntimeEventPayload =
  | SessionLifecyclePayload
  | NodeEventPayload
  | TurnEventPayload
  | EvidenceEventPayload
  | CommandEventPayload
  | PolicyEventPayload
  | AgentEventPayload;

interface SessionLifecyclePayload {
  readonly reason?: string;
  readonly totalTurns?: number;
  readonly totalElapsedMs?: number;
}

interface NodeEventPayload {
  readonly nodeId: string;
  readonly nodeKind: ExamRuntimeNodeKind;
  readonly fromNodeId?: string;
  readonly transitionCondition?: string;
}

interface TurnEventPayload {
  readonly role: "examiner" | "candidate";
  readonly text: string;
  readonly isFollowUp: boolean;
  readonly followUpIndex?: number;
  readonly durationMs?: number;
  readonly sttConfidence?: number;
  /**
   * Optional rapport-building move by the examiner.
   * Logged for quality assurance; does NOT count against follow-up limits.
   * @see Akimov & Malin (2020): "the examiner attempted to make students feel
   *   comfortable and at ease."
   */
  readonly rapportMove?: "encouragement" | "acknowledgement" | "reassurance" | "humor" | "none";
  /**
   * Scaffolding intensity of this turn (0–3).
   * 0 = no scaffolding (independent answer). 3 = heavy scaffolding.
   * Tracks the degree of examiner support provided, which is itself evidence.
   * @see Fenton (2025) on scaffolding during oral assessment.
   */
  readonly scaffoldingIntensity?: number;
}

interface EvidenceEventPayload {
  readonly signal?: EvidenceSignal;
  readonly targetId?: string;
  readonly confidence?: number;
}

/**
 * Payload for hesitation_detected events.
 * @see Fenton (2025): hesitation patterns reveal reasoning processes.
 */
interface HesitationDetectedPayload {
  readonly nodeId: string;
  readonly turnId: number;
  readonly durationMs: number;
  readonly context: "after_question" | "mid_response" | "before_conclusion";
}

/**
 * Payload for self_correction_detected events.
 * @see Fenton (2025): "students can reflect on their choices and have the
 *   chance to self-correct."
 */
interface SelfCorrectionDetectedPayload {
  readonly nodeId: string;
  readonly turnIds: readonly number[];
  readonly originalClaim: string;
  readonly correctedClaim: string;
  readonly confidence: number;
}

interface CommandEventPayload {
  readonly command: CandidateCommandType;
  readonly handled: boolean;
  readonly response?: string;
}

interface PolicyEventPayload {
  readonly policyType: string;
  readonly limit?: number;
  readonly current?: number;
  readonly action: string;
  readonly details?: string;
  /**
   * For hesitation events: how long the candidate paused (ms).
   * For self-correction events: the original and corrected claims.
   */
  readonly assessmentContext?: HesitationDetectedPayload | SelfCorrectionDetectedPayload;
}

interface AgentEventPayload {
  readonly actionType: string;
  readonly allowed: boolean;
  readonly reason?: string;
  readonly violation?: string;
}

15. TranscriptTurn

A single attributed utterance in the conversation.

/**
 * A single attributed utterance in the exam conversation.
 * Richer than raw STT output — carries node context, timing, and semantic metadata.
 * Transcript turns are the raw material; evidence signals are the structured interpretation.
 */
interface TranscriptTurn {
  /** Sequential index within the session. 0-based. */
  readonly turnIndex: number;

  /** Who produced this utterance. */
  readonly role: "examiner" | "candidate" | "system";

  /** The transcribed or generated text. */
  readonly text: string;

  /** Which node this turn occurred in. */
  readonly nodeId: string;

  /** Timestamp when the turn started. */
  readonly timestampMs: number;

  /** Duration of the turn in milliseconds. */
  readonly durationMs: number;

  /** Whether this examiner turn was a follow-up. */
  readonly isFollowUp: boolean;

  /** If follow-up, which follow-up index (0-based). */
  readonly followUpIndex?: number;

  /** If a candidate command was detected in this turn. */
  readonly candidateCommandDetected?: CandidateCommandType;

  /** STT confidence score (for candidate turns). Range: 0.0 to 1.0. */
  readonly sttConfidence?: number;

  /** Whether a recovery action was taken during this turn. */
  readonly recoveryAction?: RecoveryScenario;
}

/**
 * Records an evidence gap: a mandatory target that was not sufficiently evidenced.
 */
interface EvidenceGap {
  /** The target that was underserved. */
  readonly targetId: string;

  /** The node where evidence was expected. */
  readonly nodeId: string;

  /** Number of positive signals collected. */
  readonly positiveSignalsCollected: number;

  /** Minimum required. */
  readonly minPositiveSignalsRequired: number;

  /** How the gap was detected. */
  readonly detectedBy: "runtime_check" | "marking_pipeline" | "manual_review";

  /** Whether the gap was addressed via follow-up. */
  readonly addressedByFollowUp: boolean;

  /** Whether a recovery was attempted. */
  readonly addressedByRecovery: boolean;
}

16. TelemetryPolicy

Rules for operational data emission.

/**
 * Rules governing what operational data is emitted and where.
 * Controls the granularity and destinations of runtime telemetry.
 */
interface TelemetryPolicy {
  /** Emit events for every turn. Default: true. */
  readonly emitTurnEvents?: boolean;

  /** Emit events for evidence signals. Default: true. SHOULD always be true. */
  readonly emitEvidenceEvents?: boolean;

  /** Emit events for state transitions. Default: true. */
  readonly emitStateTransitions?: boolean;

  /**
   * Emit events for policy violations.
   * MUST always be true. This field exists for documentation, not configuration.
   */
  readonly emitPolicyViolations: true;

  /** Sampling rate for high-frequency events (0.0 to 1.0). Default: 1.0. */
  readonly samplingRate?: number;

  /** Where events are delivered. */
  readonly destinations?: readonly TelemetryDestination[];
}

type TelemetryDestination =
  | "event_store"
  | "analytics"
  | "debug_console"
  | "livekit_data_channel";

17. ContextPolicy

Rules for what context the AI examiner can access.

/**
 * Rules governing what exam context is injected into the AI examiner's prompt.
 * This is a critical agent boundary mechanism: what the examiner doesn't see,
 * it can't leak or misuse.
 */
interface ContextPolicy {
  /** Whether the examiner can see rubric criteria. Default: false. */
  readonly includeRubric?: boolean;

  /** Whether the examiner can see transcript from prior nodes. Default: false. */
  readonly includePreviousNodes?: boolean;

  /**
   * Whether the examiner can see current evidence coverage.
   * Useful for adaptive follow-up. Default: false.
   */
  readonly includeEvidenceStatus?: boolean;

  /** Whether the examiner can see prior session data for this candidate. Default: false. */
  readonly includeCandidateHistory?: boolean;

  /** Maximum tokens for context injection. */
  readonly maxContextTokens?: number;

  /** Fields that MUST NOT appear in the examiner's context. */
  readonly redactedFields?: readonly string[];
}

18. GlobalRuntimePolicies

Policies that apply across the entire exam unless overridden at node level.

/**
 * Global policies for the entire exam.
 * These are defaults that apply to all nodes unless a node provides
 * a local override.
 */
interface GlobalRuntimePolicies {
  /** Default completion policy for nodes without a local override. */
  readonly defaultCompletion?: CompletionPolicy;

  /** Default follow-up policy for nodes without a local override. */
  readonly defaultFollowUp?: FollowUpPolicy;

  /** Default recovery policies for the entire exam. */
  readonly recoveryPolicies?: readonly RecoveryPolicy[];

  /** Default transition policy (fallback). */
  readonly defaultTransition?: TransitionPolicy;

  /** Telemetry configuration. */
  readonly telemetry: TelemetryPolicy;

  /** Context policy for the AI examiner (global). */
  readonly context: ContextPolicy;

  /**
   * Global agent boundary: actions the examiner is NEVER allowed to perform,
   * regardless of node-level policy.
   */
  readonly forbiddenActions: readonly ForbiddenAction[];

  /**
   * Global time budget for the entire exam in milliseconds.
   * When exceeded, the session MUST be terminated.
   */
  readonly globalTimeBudgetMs: number;

  /**
   * What happens when the global time budget is exceeded.
   */
  readonly globalTimeoutBehavior: "force_complete" | "terminate";

  /**
   * Whether communication style (accent, fluency, verbal confidence) is a
   * declared learning outcome. When false, the LLM MUST NOT penalise or
   * comment on communication style.
   * @see Fenton (2025): equity requires not penalising communication style
   *   unless it is a specific learning outcome.
   */
  readonly communicationStyleIsLearningOutcome?: boolean;

  /**
   * Silence detection threshold in milliseconds.
   * When no candidate speech is detected within this duration, the runtime
   * triggers a silence recovery prompt.
   */
  readonly silenceTimeoutMs?: number;

  /**
   * Maximum number of silence prompts before marking node as best-effort.
   * Default: 2.
   */
  readonly maxSilencePrompts?: number;

  /**
   * Maximum candidate input length in characters.
   * Inputs exceeding this are truncated from the beginning.
   */
  readonly maxCandidateInputLength?: number;

  /**
   * Whether welfare checks are enabled for candidate distress detection.
   * When true, the runtime may offer welfare pauses when distress is detected.
   */
  readonly welfareCheckEnabled?: boolean;

  /**
   * Global conversational style policy for the AI examiner.
   * Controls tone, warmth, rapport-building, and conversational pacing.
   * @see Fenton (2015): oral assessment as "conversation rather than interrogatory."
   */
  readonly conversationalStylePolicy?: ConversationalStylePolicy;

  /**
   * Formative feedback policy for the AI examiner.
   * Controls what learning-oriented feedback the examiner can give during formative assessments.
   * Only applicable when assessmentPurpose is "formative".
   * @see Fenton (2025): oral assessment as "enhancer of student learning."
   */
  readonly formativeFeedbackPolicy?: FormativeFeedbackPolicy;

  /**
   * Time extension in milliseconds granted when candidate anxiety is detected.
   */
  readonly anxietyTimeExtensionMs?: number;

  /**
   * Maximum time to wait for candidate reconnection before aborting.
   */
  readonly reconnectTimeoutMs?: number;
}

19. PipecatAdapterConfig

Configuration hints for the Pipecat adapter layer.

/**
 * Configuration hints for the Pipecat adapter.
 * These do NOT change the IR's semantics — they guide how the adapter
 * compiles the IR into Pipecat-specific configuration.
 */
interface PipecatAdapterConfig {
  /** Target Pipecat version this config is compatible with. */
  readonly targetPipecatVersion?: string;

  /** Default STT configuration. */
  readonly sttConfig?: {
    readonly provider: string;
    readonly language: string;
    readonly model?: string;
  };

  /** Default LLM configuration for the examiner. */
  readonly llmConfig?: {
    readonly provider: string;
    readonly model: string;
    readonly temperature?: number;
    readonly maxTokens?: number;
  };

  /** Default TTS configuration. */
  readonly ttsConfig?: {
    readonly provider: string;
    readonly voice: string;
    readonly language?: string;
  };

  /** LiveKit room configuration hints. */
  readonly livekitConfig?: {
    readonly roomPrefix?: string;
    readonly dataChannelName?: string;
  };

  /**
   * System prompt template for the AI examiner.
   * MAY contain template variables resolved at runtime:
   *   {{nodePromptSeed}}, {{candidateName}}, {{evidenceTargets}},
   *   {{completionPolicy}}, {{followUpPolicy}}, etc.
   */
  readonly systemPromptTemplate?: string;

  /**
   * Per-node prompt template overrides.
   * Key is nodeId, value is the prompt template for that node.
   */
  readonly nodePromptOverrides?: Readonly<Record<string, string>>;
}

20. Utility Types

/**
 * A runtime package annotated with session-specific state.
 * Used internally by the runtime controller — NOT part of the canonical IR.
 */
interface AnnotatedRuntimePackage {
  readonly package: ExamRuntimePackage;
  readonly state: RuntimeStateSchema;
  readonly ledger: EvidenceLedger;
  readonly transcript: readonly TranscriptTurn[];
  readonly events: readonly RuntimeEvent[];
}

/**
 * Result of evaluating a completion policy against current state.
 */
interface CompletionEvaluation {
  readonly isComplete: boolean;
  readonly reason: string;
  readonly conditionsMet: readonly string[];
  readonly conditionsUnmet: readonly string[];
}

/**
 * Result of evaluating a transition condition.
 */
interface TransitionEvaluation {
  readonly isEligible: boolean;
  readonly targetNodeId: string;
  readonly conditionType: string;
  readonly details?: string;
}

/**
 * A candidate command that has been detected and needs processing.
 */
interface PendingCandidateCommand {
  readonly command: CandidateCommandType;
  readonly turnIndex: number;
  readonly rawText: string;
  readonly confidence: number;
}

21. AssessmentProfile

Assessment-theoretic profile grounded in Joughin’s (1998) six dimensions of oral assessment. Captures the design parameters that determine what the exam measures, how it is structured, and what validity/reliability claims it supports.

/**
 * Assessment-theoretic profile for the exam.
 * Encodes Joughin's (1998) six dimensions of oral assessment as design parameters.
 * When present, constrains runtime behavior and enables validity/reliability auditing.
 * When absent, defaults are inferred from node-level policies.
 *
 * @see Joughin, G. (1998). Dimensions of Oral Assessment. Assessment & Evaluation
 *   in Higher Education, 23(4), 367-378.
 */
interface AssessmentProfile {
  /**
   * What the exam primarily assesses (Joughin Dimension 1).
   * An exam may cover multiple content types.
   * Determines what counts as valid evidence and how signals should be interpreted.
   */
  readonly contentTypes: readonly ContentType[];

  /**
   * Where on the presentation–dialogue continuum (Joughin Dimension 2).
   * Affects reliability: dialogue tends toward lower reliability but higher validity.
   * @see Joughin (1998, p. 376): "reliability is threatened when interaction tends
   *   toward the dialogue pole."
   */
  readonly interactionMode: InteractionMode;

  /**
   * Target professional context and fidelity level (Joughin Dimension 3).
   * Relates to face validity and construct validity.
   * @see Akimov & Malin (2020): authenticity relates to face and construct validity.
   */
  readonly authenticityProfile?: AuthenticityProfile;

  /**
   * Degree of structural openness (Joughin Dimension 4).
   * Closed structure improves reliability; open structure improves validity for
   * probing understanding.
   * @see Joughin (1998, p. 376): "reliability is threatened when the 'structure'
   *   dimension tends towards the 'open' pole."
   */
  readonly structureProfile?: StructureProfile;

  /**
   * Examiner configuration (Joughin Dimension 5).
   * Supports AI solo, human solo, panel, and AI-with-moderator models.
   * @see Joughin (1998): self-assessment, peer assessment, authority-based.
   * @see Akimov & Malin (2020): moderation for intra-rater reliability.
   */
  readonly examinerConfig?: ExaminerConfiguration;

  /**
   * Role of the oral component (Joughin Dimension 6).
   * Purely oral vs. oral supplementing written work.
   * @see Joughin (1998): "the student's oral response may be combined with,
   *   or supplementary to, other forms of response such as a written paper."
   */
  readonly oralityProfile?: OralityProfile;

  /**
   * Validity and reliability claims the exam makes.
   * Optional structured declaration of how the exam addresses assessment quality.
   * @see Akimov & Malin (2020): validity/reliability/fairness matrix.
   */
  readonly validityClaims?: readonly ValidityClaim[];

  /**
   * Moderation policy for AI-generated evidence signals.
   * Enables human review of a sample of sessions for quality assurance.
   * @see Akimov & Malin (2020): "all online oral examinations were recorded
   *   and moderated by another finance academic."
   */
  readonly moderationPolicy?: ModerationPolicy;

  /**
   * Calibration profile for the AI examiner.
   * References calibration exercises and accuracy metrics.
   * @see Fenton (2025): "with larger cohorts, have a calibration process
   *   to reduce differences."
   */
  readonly calibrationProfile?: CalibrationProfile;
}

/** Joughin's four primary content categories (Dimension 1). */
type ContentType =
  /** Recall of facts, comprehension of meaning (Bloom's knowledge/understanding). */
  | "knowledge_understanding"
  /** "Think on one's feet," clinical reasoning, critical thinking. */
  | "applied_problem_solving"
  /** Communication skills exhibited in context, not abstract skills. */
  | "interpersonal_competence"
  /** Confidence, self-awareness, reactions to stress, personality. */
  | "intrapersonal_qualities";

/** Joughin's interaction continuum (Dimension 2). */
type InteractionMode =
  /** One-way: candidate presents, no follow-up (e.g., oral presentation). */
  | "presentation"
  /** Predetermined questions with limited, structured follow-up. */
  | "structured_dialogue"
  /** Open conversation with adaptive follow-up. */
  | "free_dialogue";

/** Joughin's authenticity continuum (Dimension 3). */
interface AuthenticityProfile {
  /** Target professional context (e.g., "clinical consultation", "job interview"). */
  readonly targetContext: string;
  /** Fidelity level: how closely the exam replicates professional practice. */
  readonly fidelityLevel: "abstract" | "simulated" | "authentic";
  /** Elements being simulated (e.g., ["patient history", "time pressure"]). */
  readonly simulationElements?: readonly string[];
}

/** Joughin's structure continuum (Dimension 4). */
interface StructureProfile {
  /**
   * Degree of openness: 0.0 = fully closed (set questions, fixed order, no deviation),
   * 1.0 = fully open (loosely structured agenda, examiner follows responses).
   */
  readonly opennessScore: number;
  /** Whether questions are known in advance by candidates. */
  readonly questionsDisclosed: boolean;
  /** Whether question order is fixed or adaptive. */
  readonly orderFixed: boolean;
}

/** Joughin's examiner dimension (Dimension 5). */
interface ExaminerConfiguration {
  /** Type of examiner administering the exam. */
  readonly examinerType: "ai_solo" | "human_solo" | "panel" | "ai_with_human_moderator";
  /** Number of examiners (for panels). */
  readonly panelSize?: number;
  /** Whether moderation of AI-generated signals is enabled. */
  readonly moderationEnabled: boolean;
  /** Sampling rate for moderation review (0.0 to 1.0). */
  readonly moderationSampleRate?: number;
}

/** Joughin's orality continuum (Dimension 6). */
interface OralityProfile {
  /** Whether the exam is purely oral or secondary to another component. */
  readonly mode: "purely_oral" | "oral_primary" | "oral_secondary";
  /** Supplementary materials the candidate must submit before the exam. */
  readonly requiredSubmissions?: readonly CandidateArtifact[];
}

/** A candidate-submitted artifact that the oral exam may defend or discuss. */
interface CandidateArtifact {
  readonly artifactId: string;
  readonly type: "written_paper" | "code" | "design" | "report" | "portfolio";
  readonly title: string;
  readonly submittedAt?: string; // ISO 8601
}

/** A validity/reliability claim the exam makes. */
interface ValidityClaim {
  readonly type: "face" | "content" | "construct" | "concurrent" | "inter_rater" | "inter_case" | "fairness" | "inter_item_consistency" | "intra_rater_reliability";
  readonly description: string;
  /** Supporting evidence or reference (e.g., validation study ID, survey results). */
  readonly supportingEvidence?: string;
}

/** Moderation policy for AI-generated evidence signals. */
interface ModerationPolicy {
  /** Whether moderation is enabled. */
  readonly enabled: boolean;
  /** Sampling strategy for selecting sessions for review. */
  readonly samplingStrategy: "random" | "stratified" | "all_fails" | "all";
  /** Sample rate (0.0 to 1.0). Only used for "random" strategy. */
  readonly sampleRate?: number;
  /** What the moderator reviews. */
  readonly reviewScope: readonly ("evidence_signals" | "examiner_behavior" | "fairness" | "transcript")[];
  /** What happens on disagreement between moderator and AI. */
  readonly disagreementAction: "flag_for_review" | "override_ai" | "escalate_to_panel";
}

/** Calibration profile for the AI examiner. */
interface CalibrationProfile {
  /** Whether calibration has been performed. */
  readonly calibrated: boolean;
  /** References to calibration exercise exam IDs. */
  readonly calibrationExamIds?: readonly string[];
  /** Measured accuracy against ground truth (0.0 to 1.0). */
  readonly accuracyAgainstGroundTruth?: number;
  /** Inter-rater reliability with human markers (Cohen's kappa). */
  readonly interRaterKappa?: number;
  /** ISO 8601 timestamp of last calibration. */
  readonly lastCalibratedAt?: string;

  /**
   * Human moderator training requirements.
   * @see Fenton (2025) Recommendation 2: "Have clear guidelines for both
   *   academic staff and students."
   * @see Fenton (2025) Recommendation 7: "Consider a training or shadowing
   *   program with experienced instructors."
   * @see Akimov & Malin (2020): "None of the 30 examiners surveyed had
   *   any training on how to conduct oral examinations."
   */
  readonly moderatorTraining?: {
    /** Whether moderator training is required before reviewing sessions. */
    readonly trainingRequired: boolean;
    /** References to training materials or documentation. */
    readonly trainingMaterials?: readonly ResourceReference[];
    /** Whether shadowing experienced moderators is required. */
    readonly shadowingRequired?: boolean;
    /** References to calibration exercises for moderators. */
    readonly calibrationExerciseIds?: readonly string[];
  };
}

22. QuestionPool

Question pools for randomized question delivery. Enables inter-case reliability by drawing equivalent questions for different candidates.

/**
 * A pool of equivalent question variants for randomized delivery.
 * Addresses inter-case reliability: different candidates receive different
 * questions of equivalent difficulty.
 *
 * @see Akimov & Malin (2020): bank of 69 questions from which students draw randomly.
 * @see Bayley et al. (2024): "each instructor developed a unique set of four questions."
 */
interface QuestionPool {
  /** Unique pool identifier within the package. */
  readonly poolId: string;

  /** Human-readable label (e.g., "Photosynthesis questions — set A"). */
  readonly label: string;

  /**
   * Question variants in this pool. All variants are considered equivalent
   * for reliability purposes.
   */
  readonly variants: readonly QuestionVariant[];

  /** How many variants to draw per session. Default: 1. */
  readonly drawCount: number;

  /**
   * Whether the same variant can appear in concurrent sessions.
   * Set to false to mitigate question-sharing (Bayley et al., 2024, p. 165).
   */
  readonly allowReuseAcrossConcurrentSessions: boolean;
}

/** A single question variant within a pool. */
interface QuestionVariant {
  /** Unique variant identifier within the pool. */
  readonly variantId: string;

  /** The prompt seed for this variant. */
  readonly promptSeed: string;

  /**
   * Estimated difficulty for equivalence checking.
   * Range: 0.0 (easiest) to 1.0 (hardest). Optional — used for calibration.
   */
  readonly difficultyEstimate?: number;

  /** Evidence targets this variant assesses. */
  readonly evidenceTargetIds: readonly string[];
}

23. ModerationRecord

Record of human moderator review of AI-generated evidence.

/**
 * A record of human moderator review of an AI-examined session.
 * Supports inter-rater reliability tracking and evidence quality assurance.
 *
 * @see Akimov & Malin (2020): "all online oral examinations were recorded
 *   and moderated by another finance academic."
 */
interface ModerationRecord {
  readonly recordId: string;
  readonly sessionId: string;
  readonly moderatorId: string;
  /** The original AI-generated evidence signals. */
  readonly originalSignals: readonly EvidenceSignal[];
  /** The moderator's adjusted signals (if any). */
  readonly adjustedSignals?: readonly EvidenceSignal[];
  /** Whether the moderator agreed with the AI assessment. */
  readonly agreed: boolean;
  /** Notes from the moderator. */
  readonly notes?: string;
  readonly timestampMs: number;
}

24. FairnessAudit

Structured fairness audit results for an exam or cohort.

/**
 * A structured fairness audit for an exam or cohort.
 * Enables detection of systematic disparities across demographic dimensions.
 *
 * @see Akimov & Malin (2020): fairness "does discriminate against students
 *   with poorer command of English."
 * @see Fenton (2025): "careful preparation is recommended to avoid any bias."
 */
interface FairnessAudit {
  readonly auditId: string;
  readonly examId: string;
  readonly cohortId?: string;
  readonly conductedAt: string; // ISO 8601
  /** Demographic dimensions analyzed (e.g., ["language_background", "gender"]). */
  readonly dimensionsAnalyzed: readonly string[];
  /** Results per dimension. */
  readonly results: readonly FairnessDimensionResult[];
  /** Whether the audit found significant disparities. */
  readonly disparitiesFound: boolean;
  /** Recommended actions if disparities found. */
  readonly recommendations?: readonly string[];
}

/** Result for a single demographic dimension. */
interface FairnessDimensionResult {
  readonly dimension: string; // e.g., "language_background", "gender"
  readonly metric: string; // e.g., "mean_evidence_confidence", "mean_followup_count"
  readonly groupValues: Readonly<Record<string, number>>;
  readonly disparitySignificant: boolean;
  readonly effectSize?: number;
}

25. SessionRecording

Session recording metadata for moderation and audit.

/**
 * Metadata for a session recording (audio/video).
 * Enables moderation review and appeal processes.
 *
 * @see Akimov & Malin (2020): "to reduce the potential problem of intra-rater
 *   reliability, all online oral examinations were recorded and moderated."
 */
interface SessionRecording {
  readonly sessionId: string;
  /** Audio recording reference. */
  readonly audioRef?: string;
  /** Video recording reference (if available). */
  readonly videoRef?: string;
  /** Whether the recording is available for moderation review. */
  readonly availableForModeration: boolean;
  /** Whether the candidate consented to recording. */
  readonly candidateConsented: boolean;
  /** Retention policy. */
  readonly retentionPolicy: {
    readonly retainUntilMs: number;
    readonly deleteAfterReview: boolean;
  };
}

26. CandidateCommand Extensions

Additional candidate commands grounded in the oral assessment literature.

/**
 * Extended candidate command type with assessment-significant additions.
 * The base CandidateCommandType (§13) covers operational commands.
 * These additions capture dialogic interaction patterns that the literature
 * identifies as assessment-significant.
 *
 * @see Joughin (1998): dialogue pole — candidates may redirect conversation.
 * @see Fenton (2025): "students can reflect on their choices and self-correct."
 */
type ExtendedCandidateCommandType =
  | CandidateCommandType
  /** Candidate pushes back on a premise or framing. Demonstrates critical thinking. */
  | "challenge_premise"
  /** Candidate wants to revisit and revise an earlier answer. */
  | "revise_earlier_answer";

27. BloomLevel

Bloom’s Taxonomy cognitive levels for classifying evidence targets.

/**
 * Bloom's Taxonomy cognitive levels (Bloom, 1956).
 * Classifies what cognitive depth an evidence target assesses.
 * Used for validation, follow-up escalation strategy, and marking rubric alignment.
 *
 * @see Bloom, B.S. (1956). Taxonomy of Educational Objectives.
 * @see Fenton (2025): "Generative AI tools have been found to perform well
 *   at the lower levels of Bloom's taxonomy but struggle at the create level
 *   and making arguments built on theoretical frameworks."
 */
type BloomLevel =
  /** Recall facts and basic concepts. */
  | "remember"
  /** Explain ideas or concepts. */
  | "understand"
  /** Use information in new situations. */
  | "apply"
  /** Draw connections among ideas. */
  | "analyze"
  /** Justify a stand or a decision. */
  | "evaluate"
  /** Produce new or original work. */
  | "create";

28. CandidateBriefing

Candidate-facing exam information for transparency and preparation.

/**
 * Candidate-facing briefing information about the exam.
 * Provides transparency about exam format, expectations, and available commands.
 * Addresses Joughin's (1998) concern that "students need to know in advance
 * what to expect of the shape of the assessment in order to prepare adequately."
 *
 * @see Joughin (1998) Dimension 4: Structure.
 * @see Fenton (2025) Recommendation 1: pre-exam information provision.
 * @see Akimov & Malin (2020): pre-exam survey findings on student preparation.
 */
interface CandidateBriefing {
  /** Human-readable description of the exam format. */
  readonly formatDescription: string;

  /** Estimated total duration in human-readable form (e.g., "20 minutes"). */
  readonly estimatedDuration?: string;

  /** Number of sections or question groups. */
  readonly sectionCount?: number;

  /** Whether a practice/mock session is available before the exam. */
  readonly practiceSessionAvailable?: boolean;

  /** Commands the candidate can use during the exam (e.g., repeat, clarification). */
  readonly availableCommands?: readonly string[];

  /** Whether the exam is open-book, closed-book, or restricted. */
  readonly bookPolicy?: "open" | "closed" | "restricted";

  /** Any materials the candidate should prepare in advance. */
  readonly requiredPreparation?: readonly string[];

  /** Criteria by which the candidate will be assessed. */
  readonly assessmentCriteria?: readonly string[];

  /** Whether the exam will be recorded. */
  readonly recordingDisclosure?: boolean;
}

29. ConversationalStylePolicy

Controls the AI examiner’s conversational tone and rapport-building.

/**
 * Policy controlling the AI examiner's conversational style.
 * Ensures the examiner creates a conversational, non-interrogatory atmosphere.
 * This is critical for AI-conducted assessment because the AI cannot rely
 * on implicit social skills.
 *
 * @see Fenton (2015): oral assessment as "conversation rather than interrogatory."
 * @see Fenton (2025): rapport-building and candidate comfort as assessment concerns.
 */
interface ConversationalStylePolicy {
  /** Overall tone of the examiner. */
  readonly tone?: "formal" | "semi_formal" | "warm" | "neutral";

  /** Level of warmth and rapport-building. */
  readonly warmth?: "low" | "medium" | "high";

  /** Whether the examiner should use the candidate's name. */
  readonly useCandidateName?: boolean;

  /** Whether the examiner should acknowledge good responses before probing further. */
  readonly acknowledgeGoodResponses?: boolean;

  /** Whether the examiner should apologize for necessary clarifications. */
  readonly apologizeForClarifications?: boolean;

  /** Maximum consecutive rapid-fire questions before a conversational pause. */
  readonly maxConsecutiveQuestions?: number;

  /** Whether the examiner should use conversational fillers (e.g., "I see", "Interesting"). */
  readonly useConversationalFillers?: boolean;
}

30. FormativeFeedbackPolicy

Controls what feedback the examiner can give during formative assessments.

/**
 * Policy controlling examiner feedback in formative assessment mode.
 * Distinguishes between evidence-relevant signals (written to the ledger)
 * and learning-oriented feedback (delivered to the candidate but not recorded as evidence).
 *
 * In formative mode, the examiner can provide real-time feedback to enhance learning.
 * In summative mode, feedback is suppressed to avoid biasing evidence.
 *
 * @see Fenton (2025): oral assessment as "enhancer of student learning."
 * @see Fenton (2025): formative vs summative distinction.
 */
interface FormativeFeedbackPolicy {
  /** Whether formative feedback is enabled. */
  readonly enabled: boolean;

  /** When feedback is delivered relative to the candidate's response. */
  readonly feedbackTiming?: "immediate" | "after_node" | "after_exam";

  /** Types of feedback the examiner may provide. */
  readonly allowedFeedbackTypes?: readonly (
    /** Acknowledge correct/strong responses. */
    | "positive_acknowledgment"
    /** Gently indicate areas for improvement. */
    | "constructive_nudge"
    /** Provide a hint or scaffold toward the correct answer. */
    | "scaffolding_hint"
    /** Summarize what the candidate has demonstrated so far. */
    | "progress_summary"
  )[];

  /** Whether feedback is recorded in the evidence ledger. */
  readonly recordFeedback?: boolean;

  /** Whether feedback can reference specific rubric criteria. */
  readonly allowRubricReference?: boolean;
}

31. ResourceReference

A reference to an external resource (training material, document, URL).

/** A reference to an external resource. */
interface ResourceReference {
  /** Human-readable label for the resource. */
  readonly label: string;
  /** URL or path to the resource. */
  readonly url?: string;
  /** Type of resource. */
  readonly type?: "document" | "video" | "exercise" | "rubric" | "guideline";
}

Revision History

Version	Date	Changes
v0.2.0	2026-06-30	Added BloomLevel, cognitiveLevel, integrated_practice, CandidateBriefing, ConversationalStylePolicy, FormativeFeedbackPolicy, promptingPrinciples, cognitiveEscalationStrategy, bookPolicy, inter_item_consistency, intra_rater_reliability, moderatorTraining, isPractice, anxietyMitigation, ResourceReference. Updated terminology from ‘IR’ to ‘specification’.
v0.1.0	2026-05-06	Initial release. 26 sections covering all core objects.