Worked Examples

Status

Draft · v0.2.0 · 2026-06-30

This chapter presents a complete, minimal but realistic example that exercises every major feature of the IOA-ORM. It is designed to be self-contained: a reader who has only skimmed the preceding chapters should be able to follow it end-to-end.

10.1 Scenario

Dimension	Value
Course	CS301 — Operating Systems (Year 3 undergraduate)
Duration	10 minutes
Language	English
Assessment type	Interactive oral — 2 structured questions with follow-ups
Examiner persona	Supportive, encouraging; asks for clarification when answers are vague
Max follow-ups per question	2
Candidate commands available	`repeat`, `clarification`, `raise_hand`

Flow Shape

┌─────────┐   ┌──────────────┐   ┌──────────────┐   ┌─────────┐   ┌─────┐
│ OPENING  │──▶│  QUESTION_1  │──▶│  QUESTION_2  │──▶│ CLOSING │──▶│ END │
└─────────┘   └──────────────┘   └──────────────┘   └─────────┘   └─────┘
                  │ (≤2 follow-ups)   │ (≤2 follow-ups)
                  └───────────────────┘

Learning Outcomes Assessed

LO-1: Explain the role of process scheduling in an OS (Knowledge)
LO-2: Compare scheduling algorithms and justify a choice (Analysis / Evaluation)

10.2 Authoring-Level Description

Lecturer’s view — what the examiner writes in the Assessment Studio.

Opening (60 s)

“Welcome to your oral assessment for CS301. I’ll ask you two questions about operating systems. Feel free to ask me to repeat or clarify at any time. You may raise your hand if you need a moment. Let’s begin.”

Question 1 — Process Scheduling (target 3 min, max 4 min)

Stem: “Can you explain what process scheduling means in the context of an operating system, and why it matters?”

Rubric signal: LO-1 — explanation of scheduling concept, mention of preemptive vs cooperative, context switch cost.

Follow-up 1 (if answer is vague or misses preemptive/cooperative): “You mentioned scheduling. Can you elaborate on the difference between preemptive and cooperative scheduling?”

Follow-up 2 (if still incomplete on context switch): “How does context switching fit into this picture?”

Transition condition: Candidate has addressed scheduling concept + at least one of {preemptive/cooperative, context switch}. OR max follow-ups exhausted. OR time budget exceeded.

Question 2 — Scheduling Algorithm Comparison (target 4 min, max 5 min)

Stem: “Suppose you’re designing a scheduler for an interactive desktop OS. Would you choose Round Robin or Shortest Job First? Why?”

Rubric signal: LO-2 — comparison of algorithms, awareness of starvation risk, response-time argument.

Follow-up 1 (if candidate doesn’t address starvation): “What problem could arise with the algorithm you didn’t choose?”

Follow-up 2 (if candidate still hasn’t mentioned response time): “From the user’s perspective, how would response time be affected?”

Transition condition: Candidate has compared at least one trade-off. OR max follow-ups exhausted. OR time budget exceeded.

Closing (60 s)

“Thank you. That concludes your oral assessment. Your responses will be reviewed and you’ll receive your results within 5 working days.”

10.3 Compiled Domain Specification (ExamRuntimeIR)

Note on schema conformance: The examples in this section use a simplified authoring-level representation for readability. The canonical TypeScript schema is defined in 02-schema.md. Key field name mappings:

Example field Canonical schema field Section
type (on nodes) kind §3 ExamRuntimeNodeKind
questionStem promptSeed §4 ExamRuntimeNode
timeBudgetSeconds timeBudgetMs (milliseconds) §4
followUps (inline array) followUpPolicy + candidateCommands §10, §13
guardrails.forbidden candidateCommands.forbidden §13
transitionPolicy.allowedTargets transitions array §11
irVersion: "1.0.0" "exam-runtime-ir/0.1" §09

The event stream (§10.5) and evidence ledger (§10.6) use canonical field names.

Example field	Canonical schema field	Section
`type` (on nodes)	`kind`	§3 `ExamRuntimeNodeKind`
`questionStem`	`promptSeed`	§4 `ExamRuntimeNode`
`timeBudgetSeconds`	`timeBudgetMs` (milliseconds)	§4
`followUps` (inline array)	`followUpPolicy` + `candidateCommands`	§10, §13
`guardrails.forbidden`	`candidateCommands.forbidden`	§13
`transitionPolicy.allowedTargets`	`transitions` array	§11
`irVersion: "1.0.0"`	`"exam-runtime-ir/0.1"`	§09

{
  "irVersion": "exam-runtime-ir/0.1",
  "examId": "cs301-oral-2026s1-001",
  "version": "3",
  "metadata": {
    "courseCode": "CS301",
    "courseName": "Operating Systems",
    "assessmentType": "interactive_oral",
    "durationMinutes": 10,
    "language": "en-GB",
    "examinerPersona": {
      "tone": "supportive_encouraging",
      "style": "asks_for_clarification_when_vague"
    }
  },
  "timeBudget": {
    "totalSeconds": 600,
    "nodeBudgets": {
      "opening": 60,
      "q1": 180,
      "q2": 240,
      "closing": 60,
      "end": 0
    },
    "overrunPolicy": "warn_at_80pct_hard_at_100pct"
  },
  "nodes": [
    {
      "nodeId": "opening",
      "type": "opening",
      "prompt": "Welcome to your oral assessment for CS301. I'll ask you two questions about operating systems. Feel free to ask me to repeat or clarify at any time. You may raise your hand if you need a moment. Let's begin.",
      "transitions": [
        {
          "target": "q1",
          "condition": "always",
          "trigger": "examiner_action"
        }
      ]
    },
    {
      "nodeId": "q1",
      "type": "question",
      "questionStem": "Can you explain what process scheduling means in the context of an operating system, and why it matters?",
      "maxFollowUps": 2,
      "timeBudgetSeconds": 240,
      "learningOutcomes": ["LO-1"],
      "evidenceTargets": [
        {
          "id": "ev-q1-scheduling-concept",
          "description": "Explains what process scheduling is",
          "rubric": "Mentions CPU allocation, multiprogramming context",
          "level": "required"
        },
        {
          "id": "ev-q1-preemptive-cooperative",
          "description": "Distinguishes preemptive vs cooperative scheduling",
          "rubric": "Defines both, gives example or explains trade-off",
          "level": "expected"
        },
        {
          "id": "ev-q1-context-switch",
          "description": "Mentions context switch cost or mechanism",
          "rubric": "Explains save/restore state, overhead awareness",
          "level": "expected"
        }
      ],
      "followUps": [
        {
          "followUpId": "q1-fu1",
          "ordinal": 1,
          "prompt": "You mentioned scheduling. Can you elaborate on the difference between preemptive and cooperative scheduling?",
          "triggerCondition": "answer_is_vague OR missing_preemptive_cooperative",
          "evidenceTargets": ["ev-q1-preemptive-cooperative"]
        },
        {
          "followUpId": "q1-fu2",
          "ordinal": 2,
          "prompt": "How does context switching fit into this picture?",
          "triggerCondition": "missing_context_switch",
          "evidenceTargets": ["ev-q1-context-switch"]
        }
      ],
      "transitionPolicy": {
        "allowedTargets": ["q2"],
        "decisionMode": "runtime_approval",
        "conditions": [
          {
            "id": "q1-sufficient",
            "description": "Sufficient evidence collected OR follow-ups exhausted OR time budget exceeded",
            "expression": "evidence_covered(['ev-q1-scheduling-concept']) AND (evidence_covered(['ev-q1-preemptive-cooperative','ev-q1-context-switch']) OR follow_up_count >= maxFollowUps OR time_budget_exceeded)"
          }
        ]
      },
      "guardrails": {
        "forbidden": ["reveal_rubric", "reveal_score", "suggest_answer"],
        "forbidden_topics": ["exam_format_policy", "grading_threshold"],
        "maxCandidateSilenceSeconds": 15,
        "silenceAction": "gentle_prompt"
      }
    },
    {
      "nodeId": "q2",
      "type": "question",
      "questionStem": "Suppose you're designing a scheduler for an interactive desktop OS. Would you choose Round Robin or Shortest Job First? Why?",
      "maxFollowUps": 2,
      "timeBudgetSeconds": 300,
      "learningOutcomes": ["LO-2"],
      "evidenceTargets": [
        {
          "id": "ev-q2-algorithm-choice",
          "description": "Chooses an algorithm and provides a rationale",
          "rubric": "Names algorithm, links to interactive desktop context",
          "level": "required"
        },
        {
          "id": "ev-q2-starvation",
          "description": "Addresses starvation risk of the other algorithm",
          "rubric": "Explains SJF starvation scenario or RR fairness trade-off",
          "level": "expected"
        },
        {
          "id": "ev-q2-response-time",
          "description": "Considers response time from user perspective",
          "rubric": "Mentions interactive responsiveness, jitter, or latency",
          "level": "expected"
        }
      ],
      "followUps": [
        {
          "followUpId": "q2-fu1",
          "ordinal": 1,
          "prompt": "What problem could arise with the algorithm you didn't choose?",
          "triggerCondition": "missing_starvation",
          "evidenceTargets": ["ev-q2-starvation"]
        },
        {
          "followUpId": "q2-fu2",
          "ordinal": 2,
          "prompt": "From the user's perspective, how would response time be affected?",
          "triggerCondition": "missing_response_time",
          "evidenceTargets": ["ev-q2-response-time"]
        }
      ],
      "transitionPolicy": {
        "allowedTargets": ["closing"],
        "decisionMode": "runtime_approval",
        "conditions": [
          {
            "id": "q2-sufficient",
            "description": "Sufficient evidence collected OR follow-ups exhausted OR time budget exceeded",
            "expression": "evidence_covered(['ev-q2-algorithm-choice']) AND (evidence_covered(['ev-q2-starvation','ev-q2-response-time']) OR follow_up_count >= maxFollowUps OR time_budget_exceeded)"
          }
        ]
      },
      "guardrails": {
        "forbidden": ["reveal_rubric", "reveal_score", "suggest_answer"],
        "forbidden_topics": ["exam_format_policy", "grading_threshold"],
        "maxCandidateSilenceSeconds": 15,
        "silenceAction": "gentle_prompt"
      }
    },
    {
      "nodeId": "closing",
      "type": "closing",
      "prompt": "Thank you. That concludes your oral assessment. Your responses will be reviewed and you'll receive your results within 5 working days.",
      "transitions": [
        {
          "target": "end",
          "condition": "always",
          "trigger": "examiner_action"
        }
      ]
    },
    {
      "nodeId": "end",
      "type": "end",
      "postExamAction": "trigger_marking_runtime"
    }
  ],
  "candidateCommands": {
    "repeat": {
      "description": "Candidate asks examiner to repeat the current question or last statement",
      "runtimeAction": "re_prompt_current",
      "costsFollowUp": false,
      "maxPerNode": 3
    },
    "clarification": {
      "description": "Candidate asks for clarification of a term or concept",
      "runtimeAction": "llm_provides_clarification_within_guardrails",
      "costsFollowUp": false,
      "maxPerNode": 2,
      "guardrail": "must_not_reveal_answer_or_rubric"
    },
    "raise_hand": {
      "description": "Candidate signals they need a moment (pause timer)",
      "runtimeAction": "pause_time_budget",
      "costsFollowUp": false,
      "maxPerNode": 2,
      "pauseDurationSeconds": 10
    }
  }
}

10.4 Simplified Pipecat Adapter Output

The adapter compiles the domain specification into a Pipecat FlowManager-compatible configuration. The runtime controller layer sits between them, so this is an intermediate representation — not the source of truth.

{
  "flow": {
    "initial_node": "opening",
    "nodes": {
      "opening": {
        "task_messages": [
          {
            "role": "system",
            "content": "You are an oral exam examiner. Be supportive and encouraging. ..."
          },
          {
            "role": "assistant",
            "content": "Welcome to your oral assessment for CS301. I'll ask you two questions about operating systems. Feel free to ask me to repeat or clarify at any time. You may raise your hand if you need a moment. Let's begin."
          }
        ],
        "pre_actions": [
          {
            "type": "emit_event",
            "event": { "type": "node_entered", "nodeId": "opening" }
          }
        ],
        "post_actions": [
          {
            "type": "emit_event",
            "event": { "type": "node_exited", "nodeId": "opening" }
          }
        ],
        "edges": [
          {
            "target": "q1",
            "transition_to": "q1",
            "interruptible": false
          }
        ]
      },
      "q1": {
        "task_messages": [
          {
            "role": "system",
            "content": "You are an oral exam examiner assessing LO-1: Process scheduling. Ask the stem question, then listen carefully. You may ask up to 2 follow-ups if the answer is vague or incomplete. Never reveal the rubric or suggest answers. If the candidate says 'repeat', re-ask the question. If they say 'clarification', explain a term without giving the answer. If they say 'raise_hand', wait 10 seconds. ..."
          },
          {
            "role": "assistant",
            "content": "Can you explain what process scheduling means in the context of an operating system, and why it matters?"
          }
        ],
        "pre_actions": [
          {
            "type": "emit_event",
            "event": { "type": "node_entered", "nodeId": "q1" }
          },
          {
            "type": "set_runtime_state",
            "key": "currentNode.followUpCount",
            "value": 0
          }
        ],
        "post_actions": [
          {
            "type": "emit_event",
            "event": { "type": "node_exited", "nodeId": "q1" }
          }
        ],
        "edges": [
          {
            "target": "q2",
            "transition_to": "q2",
            "interruptible": false,
            "condition": "runtime_approves_transition"
          }
        ],
        "runtime_config": {
          "maxFollowUps": 2,
          "timeBudgetSeconds": 240,
          "evidenceTargets": [
            "ev-q1-scheduling-concept",
            "ev-q1-preemptive-cooperative",
            "ev-q1-context-switch"
          ]
        }
      },
      "q2": {
        "task_messages": [
          {
            "role": "system",
            "content": "You are an oral exam examiner assessing LO-2: Scheduling algorithm comparison. Ask the stem question. You may ask up to 2 follow-ups. ..."
          },
          {
            "role": "assistant",
            "content": "Suppose you're designing a scheduler for an interactive desktop OS. Would you choose Round Robin or Shortest Job First? Why?"
          }
        ],
        "pre_actions": [
          {
            "type": "emit_event",
            "event": { "type": "node_entered", "nodeId": "q2" }
          },
          {
            "type": "set_runtime_state",
            "key": "currentNode.followUpCount",
            "value": 0
          }
        ],
        "post_actions": [
          {
            "type": "emit_event",
            "event": { "type": "node_exited", "nodeId": "q2" }
          }
        ],
        "edges": [
          {
            "target": "closing",
            "transition_to": "closing",
            "interruptible": false,
            "condition": "runtime_approves_transition"
          }
        ],
        "runtime_config": {
          "maxFollowUps": 2,
          "timeBudgetSeconds": 300,
          "evidenceTargets": [
            "ev-q2-algorithm-choice",
            "ev-q2-starvation",
            "ev-q2-response-time"
          ]
        }
      },
      "closing": {
        "task_messages": [
          {
            "role": "system",
            "content": "You are concluding the oral exam. Deliver the closing statement. Do not discuss performance."
          },
          {
            "role": "assistant",
            "content": "Thank you. That concludes your oral assessment. Your responses will be reviewed and you'll receive your results within 5 working days."
          }
        ],
        "pre_actions": [
          {
            "type": "emit_event",
            "event": { "type": "node_entered", "nodeId": "closing" }
          }
        ],
        "post_actions": [
          {
            "type": "emit_event",
            "event": { "type": "node_exited", "nodeId": "closing" }
          }
        ],
        "edges": [
          {
            "target": "end",
            "transition_to": "end",
            "interruptible": false
          }
        ]
      },
      "end": {
        "task_messages": [],
        "post_actions": [
          {
            "type": "emit_event",
            "event": { "type": "exam_completed" }
          },
          {
            "type": "trigger_marking_runtime",
            "payload": { "source": "exam_runtime" }
          }
        ]
      }
    }
  }
}

10.5 Runtime Event Stream

The following is the chronological event stream for a realistic exam session. Timestamps are relative to exam start (T=0). Some events are omitted for brevity; a … indicates a gap.

T+0.000s  bot_ready
            { "examId": "cs301-oral-2026s1-001", "sessionId": "sess-7a3f" }

T+0.120s  node_entered
            { "nodeId": "opening", "nodeType": "opening" }

T+0.200s  transcript_delta
            { "nodeId": "opening", "speaker": "examiner",
              "text": "Welcome to your oral assessment for CS301.", "isFinal": false }

T+2.400s  transcript_final
            { "nodeId": "opening", "speaker": "examiner",
              "text": "Welcome to your oral assessment for CS301. I'll ask you two questions about operating systems. Feel free to ask me to repeat or clarify at any time. You may raise your hand if you need a moment. Let's begin.",
              "spanId": "sp-001" }

T+2.401s  node_exited
            { "nodeId": "opening" }

T+2.402s  node_entered
            { "nodeId": "q1", "nodeType": "question" }

T+2.500s  node_progress
            { "nodeId": "q1", "followUpCount": 0, "maxFollowUps": 2,
              "evidenceCovered": [], "timeBudgetRemainingSeconds": 240 }

T+2.600s  transcript_delta
            { "nodeId": "q1", "speaker": "examiner",
              "text": "Can you explain what process scheduling means", "isFinal": false }

T+5.800s  transcript_final
            { "nodeId": "q1", "speaker": "examiner",
              "text": "Can you explain what process scheduling means in the context of an operating system, and why it matters?",
              "spanId": "sp-002" }

T+8.200s  transcript_delta
            { "nodeId": "q1", "speaker": "candidate",
              "text": "Process scheduling is when the OS decides", "isFinal": false }

T+14.600s transcript_final
            { "nodeId": "q1", "speaker": "candidate",
              "text": "Process scheduling is when the operating system decides which process gets to use the CPU at any given time. It's important because there are usually more processes than CPUs, so the OS has to manage sharing. It also helps with making sure important tasks get done first.",
              "spanId": "sp-003" }

T+14.700s evidence_signal
            { "nodeId": "q1", "evidenceTargetId": "ev-q1-scheduling-concept",
              "transcriptSpanId": "sp-003",
              "signal": "covered",
              "confidence": 0.92,
              "rationale": "Candidate described CPU allocation and multiprogramming context." }

T+14.701s node_progress
            { "nodeId": "q1", "followUpCount": 0, "maxFollowUps": 2,
              "evidenceCovered": ["ev-q1-scheduling-concept"],
              "timeBudgetRemainingSeconds": 225 }

T+14.800s transition_decision
            { "nodeId": "q1", "decision": "follow_up",
              "reason": "Evidence for ev-q1-preemptive-cooperative and ev-q1-context-switch not yet covered. Follow-up 1 trigger condition 'missing_preemptive_cooperative' met.",
              "followUpOrdinal": 1 }

T+15.000s transcript_delta
            { "nodeId": "q1", "speaker": "examiner",
              "text": "You mentioned scheduling. Can you elaborate", "isFinal": false }

T+17.200s transcript_final
            { "nodeId": "q1", "speaker": "examiner",
              "text": "You mentioned scheduling. Can you elaborate on the difference between preemptive and cooperative scheduling?",
              "spanId": "sp-004" }

T+17.300s node_progress
            { "nodeId": "q1", "followUpCount": 1, "maxFollowUps": 2,
              "evidenceCovered": ["ev-q1-scheduling-concept"],
              "timeBudgetRemainingSeconds": 223 }

T+19.800s transcript_delta
            { "nodeId": "q1", "speaker": "candidate",
              "text": "Sorry, can you repeat the question?", "isFinal": false }

T+21.400s transcript_final
            { "nodeId": "q1", "speaker": "candidate",
              "text": "Sorry, can you repeat the question?",
              "spanId": "sp-005" }

T+21.500s candidate_command
            { "nodeId": "q1", "command": "repeat",
              "triggeredBy": "candidate_utterance",
              "rawText": "Sorry, can you repeat the question?",
              "costsFollowUp": false,
              "followUpCountAfter": 1 }

T+21.600s transcript_delta
            { "nodeId": "q1", "speaker": "examiner",
              "text": "Of course. I'm asking about the difference", "isFinal": false }

T+23.800s transcript_final
            { "nodeId": "q1", "speaker": "examiner",
              "text": "Of course. I'm asking about the difference between preemptive and cooperative scheduling. In preemptive scheduling, the OS can interrupt a running process. In cooperative scheduling, the process must voluntarily yield. Can you tell me more about that?",
              "spanId": "sp-006" }

T+23.900s node_progress
            { "nodeId": "q1", "followUpCount": 1, "maxFollowUps": 2,
              "evidenceCovered": ["ev-q1-scheduling-concept"],
              "timeBudgetRemainingSeconds": 216 }

T+27.100s transcript_delta
            { "nodeId": "q1", "speaker": "candidate",
              "text": "Right, so in preemptive the OS can stop", "isFinal": false }

T+36.500s transcript_final
            { "nodeId": "q1", "speaker": "candidate",
              "text": "Right, so in preemptive scheduling the OS can stop a process at any time and switch to another one. This is what most modern OSes like Linux and Windows use. Cooperative scheduling is where the process has to give up control itself, like in older versions of Windows where if a program froze, the whole system could hang.",
              "spanId": "sp-007" }

T+36.600s evidence_signal
            { "nodeId": "q1", "evidenceTargetId": "ev-q1-preemptive-cooperative",
              "transcriptSpanId": "sp-007",
              "signal": "covered",
              "confidence": 0.95,
              "rationale": "Defined both types with real-world OS examples." }

T+36.601s node_progress
            { "nodeId": "q1", "followUpCount": 1, "maxFollowUps": 2,
              "evidenceCovered": ["ev-q1-scheduling-concept", "ev-q1-preemptive-cooperative"],
              "timeBudgetRemainingSeconds": 203 }

T+36.700s transition_decision
            { "nodeId": "q1", "decision": "follow_up",
              "reason": "ev-q1-context-switch still not covered. Follow-up 2 trigger condition 'missing_context_switch' met.",
              "followUpOrdinal": 2 }

T+37.000s transcript_final
            { "nodeId": "q1", "speaker": "examiner",
              "text": "Great examples. How does context switching fit into this picture?",
              "spanId": "sp-008" }

T+37.100s node_progress
            { "nodeId": "q1", "followUpCount": 2, "maxFollowUps": 2,
              "evidenceCovered": ["ev-q1-scheduling-concept", "ev-q1-preemptive-cooperative"],
              "timeBudgetRemainingSeconds": 203 }

T+42.300s transcript_final
            { "nodeId": "q1", "speaker": "candidate",
              "text": "Context switching is when the OS saves the state of the current process and loads the state of the next one. It's the mechanism that makes preemptive scheduling possible. But it has overhead — saving registers, updating memory maps — so you don't want to do it too frequently.",
              "spanId": "sp-009" }

T+42.400s evidence_signal
            { "nodeId": "q1", "evidenceTargetId": "ev-q1-context-switch",
              "transcriptSpanId": "sp-009",
              "signal": "covered",
              "confidence": 0.90,
              "rationale": "Described save/restore mechanism and overhead awareness." }

T+42.401s node_progress
            { "nodeId": "q1", "followUpCount": 2, "maxFollowUps": 2,
              "evidenceCovered": ["ev-q1-scheduling-concept", "ev-q1-preemptive-cooperative", "ev-q1-context-switch"],
              "timeBudgetRemainingSeconds": 197 }

T+42.500s transition_decision
            { "nodeId": "q1", "decision": "move_to_next_node",
              "reason": "All expected evidence covered. Transition condition 'q1-sufficient' satisfied.",
              "targetNodeId": "q2" }

T+42.600s node_exited
            { "nodeId": "q1" }

T+42.700s node_entered
            { "nodeId": "q2", "nodeType": "question" }

T+42.800s node_progress
            { "nodeId": "q2", "followUpCount": 0, "maxFollowUps": 2,
              "evidenceCovered": [], "timeBudgetRemainingSeconds": 240 }

T+43.000s transcript_final
            { "nodeId": "q2", "speaker": "examiner",
              "text": "Suppose you're designing a scheduler for an interactive desktop OS. Would you choose Round Robin or Shortest Job First? Why?",
              "spanId": "sp-010" }

…

T+120.400s candidate_command
            { "nodeId": "q2", "command": "raise_hand",
              "triggeredBy": "candidate_utterance",
              "rawText": "Can I have a moment to think?",
              "costsFollowUp": false,
              "followUpCountAfter": 0,
              "pauseDurationSeconds": 10 }

T+120.500s time_budget_paused
            { "nodeId": "q2", "pauseUntil": "T+130.400" }

T+130.400s time_budget_resumed
            { "nodeId": "q2", "timeBudgetRemainingSeconds": 170 }

…

T+240.000s time_budget_exceeded
            { "nodeId": "q2" }

T+240.100s transition_decision
            { "nodeId": "q2", "decision": "move_to_next_node",
              "reason": "Time budget exceeded. Evidence collected: ev-q2-algorithm-choice (covered), ev-q2-starvation (covered), ev-q2-response-time (not covered). Hard move enforced by overrun policy.",
              "targetNodeId": "closing" }

T+240.200s node_exited
            { "nodeId": "q2" }

T+240.300s node_entered
            { "nodeId": "closing", "nodeType": "closing" }

T+241.000s transcript_final
            { "nodeId": "closing", "speaker": "examiner",
              "text": "Thank you. That concludes your oral assessment. Your responses will be reviewed and you'll receive your results within 5 working days.",
              "spanId": "sp-020" }

T+243.000s node_exited
            { "nodeId": "closing" }

T+243.100s node_entered
            { "nodeId": "end", "nodeType": "end" }

T+243.200s exam_completed
            { "examId": "cs301-oral-2026s1-001", "sessionId": "sess-7a3f",
              "totalDurationSeconds": 243.2,
              "nodesVisited": ["opening", "q1", "q2", "closing", "end"],
              "totalFollowUpsUsed": 3 }

10.6 Evidence Ledger

The evidence ledger is populated from evidence_signal events and persisted by the event store. It is the primary input to the marking runtime.

{
  "examId": "cs301-oral-2026s1-001",
  "sessionId": "sess-7a3f",
  "candidateId": "stu-202400042",
  "generatedAt": "2026-05-06T02:07:43.200Z",
  "entries": [
    {
      "evidenceTargetId": "ev-q1-scheduling-concept",
      "nodeId": "q1",
      "learningOutcome": "LO-1",
      "signal": "covered",
      "confidence": 0.92,
      "transcriptSpanIds": ["sp-003"],
      "transcriptExcerpt": "Process scheduling is when the operating system decides which process gets to use the CPU at any given time. It's important because there are usually more processes than CPUs, so the OS has to manage sharing.",
      "rationale": "Candidate described CPU allocation and multiprogramming context.",
      "timestamp": "T+14.700s"
    },
    {
      "evidenceTargetId": "ev-q1-preemptive-cooperative",
      "nodeId": "q1",
      "learningOutcome": "LO-1",
      "signal": "covered",
      "confidence": 0.95,
      "transcriptSpanIds": ["sp-007"],
      "transcriptExcerpt": "In preemptive scheduling the OS can stop a process at any time and switch to another one. This is what most modern OSes like Linux and Windows use. Cooperative scheduling is where the process has to give up control itself.",
      "rationale": "Defined both types with real-world OS examples.",
      "timestamp": "T+36.600s"
    },
    {
      "evidenceTargetId": "ev-q1-context-switch",
      "nodeId": "q1",
      "learningOutcome": "LO-1",
      "signal": "covered",
      "confidence": 0.90,
      "transcriptSpanIds": ["sp-009"],
      "transcriptExcerpt": "Context switching is when the OS saves the state of the current process and loads the state of the next one. It's the mechanism that makes preemptive scheduling possible. But it has overhead.",
      "rationale": "Described save/restore mechanism and overhead awareness.",
      "timestamp": "T+42.400s"
    },
    {
      "evidenceTargetId": "ev-q2-algorithm-choice",
      "nodeId": "q2",
      "learningOutcome": "LO-2",
      "signal": "covered",
      "confidence": 0.88,
      "transcriptSpanIds": ["sp-012"],
      "transcriptExcerpt": "I would choose Round Robin because it gives each process a fair time slice, which means the system stays responsive to user input even under load.",
      "rationale": "Named algorithm, linked to interactive context.",
      "timestamp": "T+62.300s"
    },
    {
      "evidenceTargetId": "ev-q2-starvation",
      "nodeId": "q2",
      "learningOutcome": "LO-2",
      "signal": "covered",
      "confidence": 0.85,
      "transcriptSpanIds": ["sp-015"],
      "transcriptExcerpt": "SJF can lead to starvation if short jobs keep arriving — a long job might never get scheduled. That's why it's not great for interactive systems where any task could need attention.",
      "rationale": "Explained starvation scenario linked to interactive context.",
      "timestamp": "T+98.700s"
    },
    {
      "evidenceTargetId": "ev-q2-response-time",
      "nodeId": "q2",
      "learningOutcome": "LO-2",
      "signal": "not_covered",
      "confidence": null,
      "transcriptSpanIds": [],
      "transcriptExcerpt": null,
      "rationale": "Time budget exhausted before evidence could be collected.",
      "timestamp": "T+240.100s"
    }
  ],
  "summary": {
    "totalTargets": 6,
    "covered": 5,
    "notCovered": 1,
    "coverageRate": 0.833
  }
}

10.7 Candidate Command Example

Scenario: Candidate asks to repeat the question

The candidate says: “Sorry, can you repeat the question?”

Runtime behaviour:

STT produces transcript_final with the candidate’s utterance (span sp-005).
The runtime controller’s command classifier detects the repeat intent.
A candidate_command event is emitted (see §10.5, T+21.500s).
The runtime does NOT increment followUpCount — this is a clarification request, not a substantive answer attempt.
The LLM re-asks the current follow-up prompt with slight rephrasing.
A new transcript_final is emitted for the examiner’s re-prompt (span sp-006).

Key invariant: followUpCount remains at 1 after the repeat. The candidate’s ability to answer is not penalised by a repeat request.

Scenario: Candidate asks for clarification of a term

The candidate says: “What do you mean by starvation?”

Runtime behaviour:

Command classifier detects clarification intent.
candidate_command event emitted with command: "clarification".
LLM provides a brief, rubric-safe explanation — e.g., “Starvation means a process waits indefinitely because shorter jobs keep getting priority.”
The LLM must NOT say “That’s exactly what I’m looking for” or hint at the rubric (guardrail enforced).
followUpCount is NOT incremented.

10.8 Guardrail Example

Scenario: LLM attempts to reveal the rubric

During Q1 follow-up, the LLM’s generated response includes:

“That’s a good point! You’ve actually covered the preemptive vs cooperative distinction, which is one of the key rubric items for this question.”

Runtime controller response:

The LLM output is intercepted by the guardrail layer before TTS delivery.
The guardrail checks against forbidden: ["reveal_rubric"].
The response is blocked — TTS does not play it to the candidate.
A guardrail_violation event is emitted:

{
  "type": "guardrail_violation",
  "nodeId": "q1",
  "rule": "reveal_rubric",
  "severity": "blocked",
  "originalText": "That's a good point! You've actually covered the preemptive vs cooperative distinction, which is one of the key rubric items for this question.",
  "replacementAction": "regenerate_response",
  "timestamp": "T+38.200s"
}

The LLM is prompted to regenerate without rubric references.
The regenerated response plays to the candidate.

Scenario: LLM attempts to transition to an unauthorised node

The LLM, after Q1, attempts to generate text that would skip Q2 and go to closing:

“Excellent work! Let’s wrap up the assessment.”

Runtime controller response:

The runtime detects the LLM is attempting to exit Q1.
Q1’s transitionPolicy.allowedTargets is ["q2"].
The LLM does not have authority to decide the transition — only the runtime can approve move_to_next_node.
The runtime blocks the premature closing and re-injects the Q2 stem:

{
  "type": "guardrail_violation",
  "nodeId": "q1",
  "rule": "unauthorized_transition",
  "severity": "blocked",
  "originalIntent": "move_to_closing",
  "allowedTargets": ["q2"],
  "replacementAction": "inject_next_question",
  "timestamp": "T+42.800s"
}

10.9 markRuntime Input Excerpt

After exam_completed fires, the marking runtime receives a structured input package. Below is a simplified excerpt.

{
  "inputVersion": "1.0.0",
  "examId": "cs301-oral-2026s1-001",
  "sessionId": "sess-7a3f",
  "candidateId": "stu-202400042",
  "examRuntimeVersion": "1.0.0",
  "evidenceLedger": {
    // ... full ledger as shown in §10.6 ...
  },
  "transcript": {
    "totalSpans": 20,
    "fullText": "…", // concatenated transcript with span IDs
    "spans": [
      // Each span: { spanId, nodeId, speaker, text, startTime, endTime }
    ]
  },
  "runtimeAudit": {
    "nodesVisited": ["opening", "q1", "q2", "closing", "end"],
    "totalDurationSeconds": 243.2,
    "followUpsUsed": {
      "q1": 2,
      "q2": 1
    },
    "transitionDecisions": [
      {
        "nodeId": "q1",
        "decision": "move_to_next_node",
        "reason": "All expected evidence covered.",
        "timestamp": "T+42.500s"
      },
      {
        "nodeId": "q2",
        "decision": "move_to_next_node",
        "reason": "Time budget exceeded.",
        "timestamp": "T+240.100s"
      }
    ],
    "candidateCommandsUsed": [
      { "nodeId": "q1", "command": "repeat", "timestamp": "T+21.500s" },
      { "nodeId": "q2", "command": "raise_hand", "timestamp": "T+120.400s" }
    ],
    "guardrailViolations": []
  },
  "irSnapshot": {
    // Frozen copy of the ExamRuntimeIR used for this session
    // Enables marking to reference the exact rubric/evidence targets
    // that were active during the exam
  }
}

Key properties of the marking input:

Evidence ledger is pre-populated with LLM confidence scores and transcript excerpts. The marking runtime may confirm, override, or supplement these.
Transcript spans are linked to evidence targets, enabling human markers to verify the AI’s evidence assessment.
Runtime audit provides context: why transitions happened, which commands the candidate used, and whether any guardrail violations occurred.
IR snapshot freezes the assessment definition so that marking always references the exact rubric that was in effect during the exam.

10.10 Summary of Demonstrated Features

Feature	Where Demonstrated
Node progress updates	`node_progress` events throughout §10.5
Follow-up count runtime control	Q1 follow-ups 1→2, never exceeding `maxFollowUps: 2`
Candidate repeat ≠ follow-up	T+21.500s `candidate_command` with `costsFollowUp: false`
Evidence signal → transcript span	`evidence_signal` with `transcriptSpanId` references
Transition decision with reason	`transition_decision` events at T+14.800s, T+36.700s, T+42.500s, T+240.100s
`exam_completed` → marking runtime	§10.9 structured input package
Guardrail blocking rubric reveal	§10.8 first scenario
Guardrail blocking unauthorised transition	§10.8 second scenario
Time budget enforcement	Q2 time budget exceeded at T+240.000s

10.11 INFOSYS110 — Full IOA Example (University of Auckland)

Based on the INFOSYS110 Interactive Oral Assessment Handbook. This example demonstrates a multi-segment, scenario-based IOA for a first-year Business Information Systems course. It exercises persona consistency, rubric-level nudging, transversal skills, scaffolding, and the report_observation protocol.

10.11.1 Scenario

Dimension	Value
Course	INFOSYS 110 — Business Information Systems (Stage I)
Duration	20 minutes (4 segments × ~5 min each)
Language	English
Assessment type	Interactive oral — scenario-based conversation across 4 segments
Examiner persona	Hotel General Manager (professionally-focused scenario)
Core case	A hotel chain considering digital transformation of its operations
Max follow-ups per segment	3
Candidate commands	`repeat`, `clarify`, `slow_down`, `pause`, `help`
Scaffolding	Enabled — 2-minute practice conversation before exam starts
Transversal skills	`critical_thinking`, `professional_communication`, `problem_solving`

10.11.2 Flow Shape

┌────────────┐   ┌───────────┐   ┌───────────┐   ┌───────────┐   ┌───────────┐   ┌──────────┐   ┌─────┐
│ SCAFFOLDING│──▶│ SEGMENT_1 │──▶│ SEGMENT_2 │──▶│ SEGMENT_3 │──▶│ SEGMENT_4 │──▶│ CLOSING  │──▶│ END │
│ (practice) │   │ Digital   │   │ IS Roles  │   │ Data      │   │ Change &  │   │          │   │     │
│            │   │ Foundat.  │   │ & BI      │   │ Govern.   │   │ Loyalty   │   │          │   │     │
└────────────┘   └───────────┘   └───────────┘   └───────────┘   └───────────┘   └──────────┘   └─────┘
                   │ ≤3 FU         │ ≤3 FU         │ ≤3 FU         │ ≤3 FU

10.11.3 Compiled Specification — Key Nodes

{
  "irVersion": "exam-runtime-ir/0.1",
  "examId": "infosys110-ioa-2026s1-001",
  "examVersion": 1,
  "metadata": {
    "courseCode": "INFOSYS110",
    "courseName": "Business Information Systems",
    "institution": "University of Auckland",
    "assessmentType": "interactive_oral",
    "durationMinutes": 20,
    "language": "en-NZ",
    "communicationStyleIsLearningOutcome": false,
    "equivalentWrittenWordCount": 3000
  },
  "transversalSkills": [
    {
      "skillId": "critical_thinking",
      "description": "Analyses information, evaluates alternatives, forms reasoned judgements"
    },
    {
      "skillId": "professional_communication",
      "description": "Communicates ideas clearly in a professional context"
    },
    {
      "skillId": "problem_solving",
      "description": "Identifies problems and proposes practical solutions"
    }
  ],
  "scaffolding": {
    "enabled": true,
    "scenario": "A brief practice conversation to familiarise you with the assessment format. This does NOT count toward your score.",
    "conversationPrompt": "Let's do a quick practice. Imagine you're telling a friend about a new app you've been using. What does it do and why do you like it?",
    "maxDurationSeconds": 120,
    "feedbackEnabled": true
  },
  "nodes": [
    {
      "nodeId": "segment_1_digital_foundations",
      "type": "scenario_segment",
      "persona": "You are the General Manager of a mid-range hotel chain in New Zealand. You are meeting with a junior team member to discuss the hotel's digital transformation strategy.",
      "scenario": "The hotel chain is considering upgrading its booking system, implementing a mobile check-in app, and introducing IoT sensors for room management. You want to understand the candidate's grasp of digital foundations and operational trade-offs.",
      "conversationPrompt": "Thanks for coming in. We're looking at some big technology changes for the chain. Can you walk me through what you think are the key digital foundations we need to get right before we invest in new systems?",
      "evidenceSignals": [
        {
          "signalId": "ev-digital-transformation-understanding",
          "description": "Demonstrates understanding of what digital transformation means in a business context",
          "levels": ["basic_awareness", "applied_understanding", "strategic_insight"],
          "weight": 0.25
        },
        {
          "signalId": "ev-operational-trade-offs",
          "description": "Identifies trade-offs in technology adoption (cost vs benefit, disruption vs efficiency)",
          "levels": ["lists_factors", "analyses_trade_offs", "evaluates_with_evidence"],
          "weight": 0.25
        },
        {
          "signalId": "ev-infrastructure-awareness",
          "description": "Recognises infrastructure prerequisites (network, integration, training)",
          "levels": ["mentions_awareness", "explains_dependencies", "proposes_implementation_plan"],
          "weight": 0.25
        },
        {
          "signalId": "ev-customer-impact",
          "description": "Considers how technology changes affect the customer experience",
          "levels": ["acknowledges_impact", "analyses_customer_journey", "proposes_customer_centric_approach"],
          "weight": 0.25
        }
      ],
      "maxFollowUps": 3,
      "timeBudgetSeconds": 300,
      "followUpBank": [
        {
          "followUpId": "s1-fu1",
          "type": "nudge",
          "prompt": "That's a good overview. Can you tell me more about why you think those specific foundations matter — what could go wrong if we skip them?",
          "triggerCondition": "basic_awareness_level AND missing_trade_offs"
        },
        {
          "followUpId": "s1-fu2",
          "type": "probe",
          "prompt": "How would you prioritise these? If we could only do one thing first, what would it be and why?",
          "triggerCondition": "lists_factors BUT no prioritisation"
        },
        {
          "followUpId": "s1-fu3",
          "type": "challenge",
          "prompt": "Some staff might resist these changes. How does that factor into your thinking?",
          "triggerCondition": "missing_change_management_awareness"
        }
      ],
      "transitionConditions": [
        {
          "id": "s1-sufficient",
          "expression": "signal_count >= 3 AND any_signal_level >= 'applied_understanding'"
        },
        {
          "id": "s1-time-exhausted",
          "expression": "time_budget_exceeded"
        },
        {
          "id": "s1-followups-exhausted",
          "expression": "follow_up_count >= 3"
        }
      ],
      "guardrails": {
        "forbidden": ["reveal_rubric", "reveal_score", "suggest_answer", "mention_other_segments"],
        "personaBreakPatterns": ["As your examiner", "In this assessment", "Let me ask you another question"]
      }
    },
    {
      "nodeId": "segment_2_is_roles_bi",
      "type": "scenario_segment",
      "persona": "You are the General Manager continuing the meeting. Now focusing on information systems roles and how business intelligence can support decision-making.",
      "scenario": "After discussing digital foundations, you want to explore how different IS roles (database admin, systems analyst, CIO) contribute to the hotel's success, and how business intelligence dashboards could help manage operations.",
      "conversationPrompt": "Good. Now, we're also thinking about building a business intelligence dashboard for our regional managers. Can you explain what roles in an IS team would be involved in making that happen, and what kind of insights the dashboard should provide?",
      "evidenceSignals": [
        {
          "signalId": "ev-is-roles-knowledge",
          "description": "Identifies and explains key IS roles (DBA, systems analyst, CIO, etc.)",
          "levels": ["names_roles", "explains_responsibilities", "maps_roles_to_outcomes"],
          "weight": 0.3
        },
        {
          "signalId": "ev-bi-understanding",
          "description": "Demonstrates understanding of business intelligence concepts",
          "levels": ["defines_bi", "explains_bi_value", "proposes_bi_use_case"],
          "weight": 0.35
        },
        {
          "signalId": "ev-data-driven-decision",
          "description": "Connects data/information to business decision-making",
          "levels": ["mentions_data", "explains_decision_process", "proposes_metrics_framework"],
          "weight": 0.35
        }
      ],
      "maxFollowUps": 3,
      "timeBudgetSeconds": 300,
      "followUpBank": [
        {
          "followUpId": "s2-fu1",
          "type": "nudge",
          "prompt": "You've mentioned the roles. How would these people work together day-to-day on the dashboard project?",
          "triggerCondition": "names_roles BUT no collaboration_explanation"
        },
        {
          "followUpId": "s2-fu2",
          "type": "probe",
          "prompt": "What specific metrics would a regional manager find most useful on that dashboard?",
          "triggerCondition": "defines_bi BUT no specific_metrics"
        },
        {
          "followUpId": "s2-fu3",
          "type": "challenge",
          "prompt": "What if the data in the dashboard is wrong or outdated? How does that affect decision-making?",
          "triggerCondition": "missing_data_quality_awareness"
        }
      ],
      "transitionConditions": [
        {
          "id": "s2-sufficient",
          "expression": "signal_count >= 2 AND has_signal('ev-bi-understanding') AND any_signal_level >= 'explains_bi_value'"
        }
      ]
    },
    {
      "nodeId": "segment_3_data_governance",
      "type": "scenario_segment",
      "persona": "You are the General Manager. The conversation has turned to the responsibilities that come with collecting and analysing guest data.",
      "scenario": "The hotel collects guest preferences, booking patterns, and feedback data. You need to understand the candidate's awareness of data governance, privacy, and responsible analytics practices.",
      "conversationPrompt": "One more thing — we collect a lot of guest data for the loyalty programme and the BI dashboard. What should we be thinking about in terms of data governance and responsible use of that data?",
      "evidenceSignals": [
        {
          "signalId": "ev-data-governance-awareness",
          "description": "Demonstrates understanding of data governance principles",
          "levels": ["mentions_governance", "explains_frameworks", "proposes_governance_policy"],
          "weight": 0.3
        },
        {
          "signalId": "ev-privacy-ethics",
          "description": "Considers privacy, consent, and ethical use of data",
          "levels": ["mentions_privacy", "explains_consent_model", "proposes_ethical_framework"],
          "weight": 0.35
        },
        {
          "signalId": "ev-responsible-analytics",
          "description": "Understands responsible analytics practices (bias, transparency, accountability)",
          "levels": ["mentions_awareness", "explains_risks", "proposes_mitigation_strategy"],
          "weight": 0.35
        }
      ],
      "maxFollowUps": 3,
      "timeBudgetSeconds": 300,
      "followUpBank": [
        {
          "followUpId": "s3-fu1",
          "type": "nudge",
          "prompt": "That's an important point. Can you tell me more about how we'd actually implement that in practice?",
          "triggerCondition": "mentions_privacy BUT no implementation_detail"
        },
        {
          "followUpId": "s3-fu2",
          "type": "probe",
          "prompt": "What about the ethical side — are there things we could do with the data that we probably shouldn't?",
          "triggerCondition": "missing_ethical_consideration"
        },
        {
          "followUpId": "s3-fu3",
          "type": "challenge",
          "prompt": "If a guest asked to see all the data we hold about them, could we do it? What would that involve?",
          "triggerCondition": "missing_data_subject_rights"
        }
      ]
    },
    {
      "nodeId": "segment_4_change_loyalty",
      "type": "scenario_segment",
      "persona": "You are the General Manager. Final segment covering project change management and the loyalty programme logic.",
      "scenario": "The hotel is rolling out a new loyalty programme alongside the technology changes. You want to assess the candidate's understanding of change management, human factors, and how loyalty programme logic works.",
      "conversationPrompt": "Last topic. We're launching a new loyalty programme alongside all these tech changes. How would you approach the change management side of this, and can you walk me through how you think the loyalty programme logic should work?",
      "evidenceSignals": [
        {
          "signalId": "ev-change-management",
          "description": "Demonstrates understanding of change management principles in IT projects",
          "levels": ["mentions_change", "explains_approach", "proposes_change_plan"],
          "weight": 0.25
        },
        {
          "signalId": "ev-human-factors",
          "description": "Considers human factors (training, resistance, user adoption)",
          "levels": ["mentions_people", "analyses_barriers", "proposes_adoption_strategy"],
          "weight": 0.25
        },
        {
          "signalId": "ev-loyalty-programme-logic",
          "description": "Explains loyalty programme mechanics (points, tiers, rewards, data capture)",
          "levels": ["describes_basics", "explains_logic", "proposes_optimisation"],
          "weight": 0.25
        },
        {
          "signalId": "ev-integration-thinking",
          "description": "Connects loyalty programme to broader digital strategy",
          "levels": ["mentions_connection", "explains_integration", "proposes_synergies"],
          "weight": 0.25
        }
      ],
      "maxFollowUps": 3,
      "timeBudgetSeconds": 300,
      "followUpBank": [
        {
          "followUpId": "s4-fu1",
          "type": "nudge",
          "prompt": "Good thinking. How would you handle staff who are resistant to the new system?",
          "triggerCondition": "mentions_change BUT no resistance_handling"
        },
        {
          "followUpId": "s4-fu2",
          "type": "probe",
          "prompt": "Walk me through how a guest would earn and redeem points in this programme.",
          "triggerCondition": "describes_basics BUT no detailed_logic"
        },
        {
          "followUpId": "s4-fu3",
          "type": "challenge",
          "prompt": "How does the loyalty programme data feed back into the BI dashboard we discussed earlier?",
          "triggerCondition": "missing_integration_with_segment_2"
        }
      ]
    },
    {
      "nodeId": "closing",
      "type": "closing",
      "persona": "You are the General Manager wrapping up the meeting.",
      "conversationPrompt": "That's great, thank you for your time today. You've given me a lot to think about. We'll be in touch with feedback soon.",
      "timeBudgetSeconds": 30,
      "transitions": [{ "target": "end", "condition": "always" }]
    },
    {
      "nodeId": "end",
      "type": "end"
    }
  ],
  "transitions": [
    { "from": "scaffolding", "to": "segment_1_digital_foundations", "condition": "always" },
    { "from": "segment_1_digital_foundations", "to": "segment_2_is_roles_bi", "condition": "node_complete" },
    { "from": "segment_2_is_roles_bi", "to": "segment_3_data_governance", "condition": "node_complete" },
    { "from": "segment_3_data_governance", "to": "segment_4_change_loyalty", "condition": "node_complete" },
    { "from": "segment_4_change_loyalty", "to": "closing", "condition": "node_complete" },
    { "from": "closing", "to": "end", "condition": "always" }
  ]
}

10.11.4 Compiled Pipecat NodeConfig (Segment 1)

# Compiled from specification node: segment_1_digital_foundations
segment_1_config = {
    "name": "segment_1_digital_foundations",

    "role_message": (
        "You are the General Manager of a mid-range hotel chain in New Zealand. "
        "You are meeting with a junior team member to discuss the hotel's digital "
        "transformation strategy. Be professional but approachable. Ask follow-up "
        "questions naturally, as a manager would in a real meeting."
    ),

    "task_messages": [
        {
            "role": "developer",
            "content": (
                "SCENARIO: The hotel chain is considering upgrading its booking system, "
                "implementing a mobile check-in app, and introducing IoT sensors for room "
                "management. You want to understand the candidate's grasp of digital "
                "foundations and operational trade-offs.\n\n"
                "OPENING: Thanks for coming in. We're looking at some big technology "
                "changes for the chain. Can you walk me through what you think are the "
                "key digital foundations we need to get right before we invest in new systems?\n\n"
                "EVIDENCE TO LISTEN FOR:\n"
                "- digital_transformation_understanding: Demonstrates understanding of what "
                "digital transformation means in a business context (levels: basic_awareness, "
                "applied_understanding, strategic_insight)\n"
                "- operational_trade_offs: Identifies trade-offs in technology adoption "
                "(levels: lists_factors, analyses_trade_offs, evaluates_with_evidence)\n"
                "- infrastructure_awareness: Recognises infrastructure prerequisites "
                "(levels: mentions_awareness, explains_dependencies, proposes_implementation_plan)\n"
                "- customer_impact: Considers how technology changes affect the customer "
                "experience (levels: acknowledges_impact, analyses_customer_journey, "
                "proposes_customer_centric_approach)\n\n"
                "TRANSVERSAL SKILLS TO OBSERVE:\n"
                "- critical_thinking: Does the candidate analyse information and form reasoned judgements?\n"
                "- professional_communication: Does the candidate communicate clearly in a professional context?\n"
                "- problem_solving: Does the candidate identify problems and propose practical solutions?\n\n"
                "CONSTRAINTS:\n"
                "- Maximum 3 follow-up questions\n"
                "- Time budget: 300 seconds\n"
                "- NEVER: reveal rubric, reveal score, suggest answer, mention other segments\n"
                "- If the candidate demonstrates basic_awareness, nudge toward applied_understanding\n\n"
                "After every candidate response, call report_observation with your "
                "assessment of the response, any evidence signals you detected, "
                "and what you want to say next."
            )
        }
    ],

    "functions": [report_observation],  # The ONE function

    "pre_actions": [
        {"type": "function", "handler": emit_node_entered},
        {"type": "tts_say", "text": None}  # Opening is in task_messages
    ],

    "post_actions": [
        {"type": "function", "handler": finalize_node}
    ],

    "context_strategy": ContextStrategyConfig(strategy=ContextStrategy.RESET),
}

10.11.5 Runtime Execution Trace (Segment 1, excerpt)

T+00.000s  exam_state → in_progress
T+00.100s  node_entered: segment_1_digital_foundations
T+00.200s  LLM generates opening via report_observation(spokenText="Thanks for coming in...")
T+00.300s  TTS speaks opening
T+15.000s  Candidate responds: "I think the main thing is getting the network infrastructure
           right, and making sure the booking system can talk to the other systems..."
T+15.100s  STT transcript segment → Runtime Controller
T+16.000s  LLM calls report_observation({
             signals: [
               {signalType: "digital_transformation_understanding", rubricLevel: "basic_awareness",
                excerpt: "getting the network infrastructure right", confidence: 0.7},
               {signalType: "infrastructure_awareness", rubricLevel: "mentions_awareness",
                excerpt: "booking system can talk to the other systems", confidence: 0.8}
             ],
             answerQuality: "substantive",
             needsFollowUp: true, followUpType: "nudge",
             evidenceSufficient: false,
             anxietyDetected: false,
             spokenText: "That's a good starting point. Can you tell me more about why
                         those specific foundations matter — what could go wrong if we skip them?"
           })
T+16.100s  Runtime Controller:
           - Validates spokenText → PASS (no rubric leak, in persona, on topic)
           - Writes 2 evidence signals to ledger
           - Checks guardrails: followUpCount=0 < maxFollowUps=3 → ALLOW follow-up
           - followUpCount → 1
           - Returns {status: "followup_injected"}
T+16.200s  TTS speaks follow-up
T+35.000s  Candidate responds: "If we skip the infrastructure, the systems won't integrate
           properly. And from the customer side, if the app is slow or crashes during
           check-in, that's a terrible first impression..."
T+36.000s  LLM calls report_observation({
             signals: [
               {signalType: "operational_trade_offs", rubricLevel: "analyses_trade_offs",
                excerpt: "systems won't integrate properly", confidence: 0.85},
               {signalType: "customer_impact", rubricLevel: "analyses_customer_journey",
                excerpt: "app is slow or crashes during check-in", confidence: 0.9}
             ],
             answerQuality: "substantive",
             needsFollowUp: false,
             evidenceSufficient: true,
             anxietyDetected: false,
             spokenText: "Excellent point about the customer experience. That's exactly
                         the kind of thinking we need. Let's move on to the next topic."
           })
T+36.100s  Runtime Controller:
           - Validates spokenText → PASS
           - Writes 2 more evidence signals (total: 4)
           - Checks: evidenceSufficient=true AND signalTypes cover 4/4 targets → TRANSITION
           - Builds segment_2 config → flow_manager.set_node_from_config(segment_2_config)
T+36.200s  node_exit: segment_1_digital_foundations (reason: evidence_sufficient)
T+36.300s  node_entered: segment_2_is_roles_bi

10.11.6 Evidence Ledger Output (Segment 1)

{
  "nodeId": "segment_1_digital_foundations",
  "completionStatus": "completed",
  "durationSeconds": 36,
  "followUpsUsed": 1,
  "signals": [
    {
      "signalId": "sig-001",
      "signalType": "digital_transformation_understanding",
      "rubricLevel": "basic_awareness",
      "excerpt": "getting the network infrastructure right",
      "confidence": 0.7,
      "turnId": "turn-002",
      "timestamp": "2026-05-06T02:10:15Z",
      "transversalSkills": ["critical_thinking"]
    },
    {
      "signalId": "sig-002",
      "signalType": "infrastructure_awareness",
      "rubricLevel": "mentions_awareness",
      "excerpt": "booking system can talk to the other systems",
      "confidence": 0.8,
      "turnId": "turn-002",
      "timestamp": "2026-05-06T02:10:15Z",
      "transversalSkills": ["problem_solving"]
    },
    {
      "signalId": "sig-003",
      "signalType": "operational_trade_offs",
      "rubricLevel": "analyses_trade_offs",
      "excerpt": "systems won't integrate properly",
      "confidence": 0.85,
      "turnId": "turn-004",
      "timestamp": "2026-05-06T02:10:35Z",
      "transversalSkills": ["critical_thinking"]
    },
    {
      "signalId": "sig-004",
      "signalType": "customer_impact",
      "rubricLevel": "analyses_customer_journey",
      "excerpt": "app is slow or crashes during check-in",
      "confidence": 0.9,
      "turnId": "turn-004",
      "timestamp": "2026-05-06T02:10:35Z",
      "transversalSkills": ["critical_thinking", "problem_solving"]
    }
  ]
}

10.11.7 Key Differences from CS301 Example

Dimension	CS301 (§10.1–10.10)	INFOSYS110 (§10.11)
Format	2 questions with follow-ups	4 scenario segments in one conversation
Persona	Neutral examiner	Hotel General Manager (role-play)
Scenario	No scenario	Full professional scenario with context
Rubric levels	Binary (present/absent)	Multi-level (basic → applied → strategic)
Transversal skills	Not tracked	Tracked across all segments
Scaffolding	Not included	2-minute practice conversation
Follow-up strategy	Probe/redirect	Probe/redirect/scaffold/challenge/nudge
Conversation style	Q&A	Free-flowing professional dialogue
Equity	Not addressed	`communicationStyleIsLearningOutcome: false`
Context strategy	Not specified	RESET between segments
Time budget	Per question	Per segment (5 min each)

10.11.8 Scaffolding Trace

T-120.000s  exam_state → scaffolding
T-120.100s  node_entered: scaffolding
T-120.200s  LLM: "Let's do a quick practice. Imagine you're telling a friend about
            a new app you've been using. What does it do and why do you like it?"
T-100.000s  Candidate: "Well, I use this app called Notion for organising my notes..."
T-080.000s  LLM: "Great, that's perfect. You're speaking clearly and giving good
            detail. In the real assessment, just keep doing what you're doing.
            We'll start now."
T-080.100s  exam_state → in_progress
T-080.200s  node_entered: segment_1_digital_foundations

Note: Scaffolding transcript is NOT included in the MarkingPackage. It exists only for candidate familiarisation and QA purposes. | raise_hand pauses timer | T+120.400s pause, T+130.400s resume |

10.12 Joughin Dimension Coverage Map

Joughin (1998) identifies six dimensions of oral assessment. The following table maps the worked examples against these dimensions to demonstrate that the specification can represent the full oral assessment design space.

Dimension	Range	CS301 (§10.1)	INFOSYS110 (§10.11)	Viva Voce (§10.13)	OSCE (§10.14)	ConVOE (§10.15)
1. Content Type	Knowledge; Applied Problem Solving; Interpersonal; Intrapersonal	Knowledge	Knowledge + Applied	Applied + Knowledge	Interpersonal + Applied	Knowledge
2. Interaction	Presentation ↔ Dialogue	Structured Q&A	Scenario dialogue	Defence dialogue	Station-based dialogue	Presentation (recorded)
3. Authenticity	Contextualised ↔ Decontextualised	Decontextualised	Semi-contextualised	Semi-contextualised	Authentic (clinical)	Decontextualised
4. Structure	Closed ↔ Open	Closed	Moderately closed	Moderately open	Closed (timed stations)	Fully closed
5. Examiners	Self; Peer; Authority	Single authority	Single authority (role-play)	Single authority	Authority + SP	AI examiner (automated)
6. Orality	Purely oral ↔ Secondary	Purely oral	Purely oral	Oral secondary (defends written work)	Oral + physical demo	Purely oral (recorded)

This coverage map demonstrates that the specification’s node-graph architecture, policy system, and evidence model can represent assessments spanning all six of Joughin’s dimensions.

10.13 Viva Voce Example — Oral Defence of Written Work

Modality: Oral defence of a prior written submission. Joughin dimension coverage: “Orality as secondary” — the oral component supplements a written artifact. This is the modality Akimov & Malin (2020) implemented: students defended a written bond analysis project orally. Also exercises the “applied problem solving” content type and “moderately open” structure.

10.13.1 Scenario

Dimension	Value
Course	RES501 — Research Methods (Postgraduate)
Duration	20 minutes
Language	English
Assessment type	Viva voce — oral defence of a written research proposal
Prior work	Candidate submits a 3,000-word research proposal 1 week before the exam
Examiner persona	Academic supervisor — supportive but rigorous
Max follow-ups per section	3
Orality role	Secondary — oral component supplements the written proposal

10.13.2 Flow Shape

┌──────────┐   ┌───────────────┐   ┌───────────────┐   ┌───────────────┐   ┌──────────┐   ┌─────┐
│ OPENING  │──▶│ METHODOLOGY   │──▶│ LIT REVIEW    │──▶│ FEASIBILITY   │──▶│ CLOSING  │──▶│ END │
│          │   │ DEFENCE       │   │ DEFENCE       │   │ & ETHICS      │   │          │   │     │
└──────────┘   └───────────────┘   └───────────────┘   └───────────────┘   └──────────┘   └─────┘
                 │ ≤3 FU              │ ≤3 FU              │ ≤3 FU
                 │ references         │ references         │ references
                 │ prior_work         │ prior_work         │ prior_work

10.13.3 Key Specification Features

{
  "irVersion": "exam-runtime-ir/0.1",
  "examId": "res501-viva-2026s1-001",
  "metadata": {
    "courseCode": "RES501",
    "assessmentType": "viva_voce",
    "durationMinutes": 20,
    "oralityRole": "secondary",
    "priorWorkRequired": true,
    "assessmentProfile": {
      "interactionMode": "structured_dialogue",
      "contentTypes": ["applied_problem_solving", "knowledge_understanding"],
      "structureLevel": "semi-structured",
      "authenticityLevel": "simulated",
      "assessmentPurpose": "summative"
    }
  },
  "priorWork": {
    "artifactId": "research-proposal-2026",
    "type": "written_paper",
    "title": "Research Proposal: Impact of AI on Assessment Design",
    "submissionDeadline": "2026-05-20T23:59:00Z",
    "maxWordCount": 3000,
    "availableToExaminer": true
  },
  "nodes": [
    {
      "nodeId": "methodology_defence",
      "type": "question",
      "questionStem": "Your proposal uses a mixed-methods design. Can you walk me through why you chose this approach over a purely quantitative or purely qualitative study?",
      "maxFollowUps": 3,
      "timeBudgetSeconds": 420,
      "evidenceTargets": [
        {
          "id": "ev-methodology-rationale",
          "description": "Articulates a clear rationale for mixed-methods design",
          "rubric": "References research questions, discusses complementarity of methods",
          "level": "required"
        },
        {
          "id": "ev-methodology-alternatives",
          "description": "Demonstrates awareness of alternative methodological approaches",
          "rubric": "Names at least one alternative and explains why it was not chosen",
          "level": "expected"
        },
        {
          "id": "ev-methodology-limitations",
          "description": "Acknowledges limitations of chosen approach",
          "rubric": "Identifies at least one limitation and discusses mitigation",
          "level": "expected"
        }
      ],
      "followUpBank": [
        {
          "followUpId": "meth-fu1",
          "type": "probe",
          "prompt": "In your proposal, you mention using thematic analysis for the qualitative data. Can you explain why thematic analysis rather than, say, grounded theory?",
          "referencesPriorWork": true
        },
        {
          "followUpId": "meth-fu2",
          "type": "challenge",
          "prompt": "A reviewer might argue that your sample size of 15 interviews is too small for meaningful qualitative analysis. How would you respond?",
          "referencesPriorWork": true
        }
      ],
      "guardrails": {
        "forbidden": ["reveal_rubric", "suggest_answer"],
        "mustReferencePriorWork": true
      }
    }
    // ... additional nodes for lit review defence, feasibility & ethics ...
  ]
}

10.13.4 Key Differences from CS301/INFOSYS110

Feature	CS301 / INFOSYS110	Viva Voce
Prior work	None	Written proposal submitted before exam
Orality role	Purely oral	Oral secondary (defends written work)
Follow-up references	Based on candidate’s spoken answer	References specific sections of the written proposal
Evidence targets	Assessed from speech alone	Assessed from speech + written work alignment
Structure	Closed / moderately closed	Moderately open (examiner probes reasoning behind choices)
Content type	Knowledge + applied	Applied problem solving (research design justification)

10.13.5 Evidence Ledger Differences

The evidence ledger for a viva voce includes an additional priorWorkReference field linking evidence signals to specific sections of the written submission:

{
  "evidenceTargetId": "ev-methodology-rationale",
  "nodeId": "methodology_defence",
  "signal": "covered",
  "confidence": 0.88,
  "transcriptSpanIds": ["sp-015"],
  "priorWorkReference": {
    "section": "3.2 Research Design",
    "excerpt": "A mixed-methods approach is adopted to triangulate findings...",
    "alignment": "candidate's oral explanation consistent with written rationale"
  },
  "rationale": "Candidate articulated rationale that aligns with §3.2 of their proposal."
}

10.14 OSCE Station Example — Clinical Assessment

Modality: Objective Structured Clinical Examination station. Joughin dimension coverage: “Authenticity” at the authentic pole, “interpersonal competence” content type, “orality as secondary” (oral + physical demonstration). The versioning document (§09) references OSCE packages; this example demonstrates the full worked instantiation.

10.14.1 Scenario

Dimension	Value
Course	MED302 — Clinical Skills (Year 3 Medicine)
Duration	8 minutes per station
Language	English
Assessment type	OSCE station — patient history-taking + clinical reasoning
Examiner persona	Standardised Patient (SP) playing a 45-year-old with chest pain
Max follow-ups	2 (time-constrained station)
Orality role	Secondary — oral interaction + physical examination demonstration
Professional body	Mapped to AMC (Australian Medical Council) clinical competencies

10.14.2 Flow Shape

┌───────────┐   ┌───────────────────┐   ┌───────────────────┐   ┌──────────┐   ┌─────┐
│ STATION   │──▶│ HISTORY-TAKING    │──▶│ CLINICAL          │──▶│ CLOSING  │──▶│ END │
│ BRIEFING  │   │ (5 min)           │   │ REASONING (3 min) │   │          │   │     │
└───────────┘   └───────────────────┘   └───────────────────┘   └──────────┘   └─────┘
                   │ ≤2 FU                │ ≤2 FU

10.14.3 Key Specification Features

{
  "irVersion": "exam-runtime-ir/0.1",
  "examId": "med302-osce-station2-2026s1-001",
  "metadata": {
    "courseCode": "MED302",
    "assessmentType": "osce_station",
    "durationMinutes": 8,
    "stationNumber": 2,
    "clinicalDomain": "cardiology",
    "accreditationMapping": ["AMC-12.1", "AMC-12.3", "AMC-14.2"],
    "assessmentProfile": {
      "interactionMode": "structured_dialogue",
      "contentTypes": ["interpersonal_competence", "applied_problem_solving"],
      "structureLevel": "closed",
      "authenticityLevel": "authentic",
      "assessmentPurpose": "summative"
    }
  },
  "standardisedPatient": {
    "persona": "You are a 45-year-old office worker presenting to the emergency department with chest pain that started 2 hours ago. Describe your pain as dull, central, radiating to your left arm. You are anxious but cooperative. Answer questions accurately but do not volunteer information unless asked.",
    "trainingLevel": "certified_SP",
    "consistencyScript": true
  },
  "nodes": [
    {
      "nodeId": "history_taking",
      "type": "scenario_segment",
      "persona": "[Standardised Patient persona from above]",
      "conversationPrompt": "Good morning. Can you tell me what's brought you in today?",
      "maxFollowUps": 2,
      "timeBudgetSeconds": 300,
      "evidenceTargets": [
        {
          "id": "ev-history-chief-complaint",
          "description": "Elicits the chief complaint and characterises the pain (OPQRST)",
          "rubric": "Asks about onset, provocation, quality, radiation, severity, timing",
          "level": "required",
          "competencyMapping": "AMC-12.1"
        },
        {
          "id": "ev-history-risk-factors",
          "description": "Assesses cardiovascular risk factors",
          "rubric": "Asks about smoking, family history, hypertension, diabetes, cholesterol",
          "level": "required",
          "competencyMapping": "AMC-12.1"
        },
        {
          "id": "ev-history-differential",
          "description": "Considers differential diagnoses during history-taking",
          "rubric": "Asks questions that help distinguish cardiac from non-cardiac causes",
          "level": "expected",
          "competencyMapping": "AMC-12.3"
        },
        {
          "id": "ev-communication-compassion",
          "description": "Demonstrates compassionate, patient-centred communication",
          "rubric": "Uses open questions, active listening, acknowledges patient concerns, explains next steps",
          "level": "required",
          "competencyMapping": "AMC-14.2",
          "evidenceDimension": "interpersonal_competence"
        }
      ],
      "followUpBank": [
        {
          "followUpId": "hist-fu1",
          "type": "probe",
          "prompt": "You haven't asked about my family history yet. Is there anything else you'd like to know?",
          "triggerCondition": "missing_risk_factors"
        }
      ]
    },
    {
      "nodeId": "clinical_reasoning",
      "type": "question",
      "questionStem": "Based on the history you've taken, what are your top three differential diagnoses and how would you prioritise your investigations?",
      "maxFollowUps": 2,
      "timeBudgetSeconds": 180,
      "evidenceTargets": [
        {
          "id": "ev-differential-diagnosis",
          "description": "Generates appropriate differential diagnoses",
          "rubric": "Includes acute coronary syndrome, considers PE, aortic dissection, musculoskeletal",
          "level": "required",
          "competencyMapping": "AMC-12.3"
        },
        {
          "id": "ev-investigation-plan",
          "description": "Proposes a rational investigation plan",
          "rubric": "ECG, troponin, CXR as first-line; considers risk stratification",
          "level": "required",
          "competencyMapping": "AMC-12.3"
        }
      ]
    }
  ]
}

10.14.4 Key Differences from Other Examples

Feature	CS301 / INFOSYS110	OSCE Station
Authenticity	Decontextualised / simulated	Authentic (clinical setting)
Content type	Knowledge / applied	Interpersonal competence + applied
Persona	Examiner / manager	Standardised Patient (trained actor)
Time constraint	Flexible (10–20 min)	Strict station time (8 min)
Competency mapping	Learning outcomes	Professional body accreditation standards
Communication as evidence	Optional transversal skill	Required evidence target (AMC-14.2)
Orality	Purely oral	Oral + physical demonstration

10.15 ConVOE Example — Concurrent Video-Based Oral Exam

Modality: Concurrent Video-Based Oral Exam (ConVOE), as described by Bayley et al. (2024). All students simultaneously record video responses to questions via an LMS. This is the “presentation” pole of Joughin’s interaction dimension — one-way delivery with no real-time dialogue.

Key difference: The specification’s real-time dialogue model is adapted to support a “recorded response” mode where the candidate records answers without live examiner interaction.

10.15.1 Scenario

Dimension	Value
Course	BUS201 — Business Analytics (Year 2, 600+ students)
Duration	20 minutes (4 questions × 5 min max each, no backtracking)
Language	English
Assessment type	ConVOE — recorded video responses, no live dialogue
Interaction mode	Presentation (one-way, no follow-ups)
Platform	LMS with video recording integration
Cohort size	620 students, concurrent administration
Grading	Parallel evaluation (all students graded on Q1 before Q2)

10.15.2 Flow Shape

┌──────────┐   ┌──────┐   ┌──────┐   ┌──────┐   ┌──────┐   ┌──────────┐   ┌─────┐
│ BRIEFING │──▶│  Q1  │──▶│  Q2  │──▶│  Q3  │──▶│  Q4  │──▶│ CLOSING  │──▶│ END │
│ + PRACTICE│   │ (5m) │   │ (5m) │   │ (5m) │   │ (5m) │   │          │   │     │
└──────────┘   └──────┘   └──────┘   └──────┘   └──────┘   └──────────┘   └─────┘
                 no FU        no FU       no FU        no FU
                 no backtrack  no backtrack no backtrack  no backtrack

10.15.3 Key Specification Features

{
  "irVersion": "exam-runtime-ir/0.1",
  "examId": "bus201-convoe-2026s1-001",
  "metadata": {
    "courseCode": "BUS201",
    "assessmentType": "convoe",
    "durationMinutes": 20,
    "language": "en-CA",
    "expectedCandidateCount": 620,
    "concurrentAdministration": true,
    "assessmentProfile": {
      "interactionMode": "presentation",
      "contentTypes": ["knowledge_understanding", "applied_problem_solving"],
      "structureLevel": "closed",
      "authenticityLevel": "decontextualised",
      "assessmentPurpose": "summative"
    }
  },
  "administration": {
    "mode": "recorded_response",
    "backtrackingAllowed": false,
    "recordingFormat": "video",
    "maxResponseTimeSec": 300,
    "thinkingTimeSec": 30,
    "practiceQuestionEnabled": true,
    "questionRotation": {
      "enabled": true,
      "poolSizePerSlot": 5,
      "antiCollusionWindow": "same_day"
    }
  },
  "cohort": {
    "cohortId": "bus201-2026s1-cohort",
    "administrationWindow": {
      "startAt": "2026-06-01T09:00:00Z",
      "endAt": "2026-06-01T11:00:00Z"
    },
    "concurrent": true,
    "gradingStrategy": "parallel_evaluation"
  },
  "nodes": [
    {
      "nodeId": "briefing",
      "type": "opening",
      "prompt": "Welcome to your BUS201 oral assessment. You will answer 4 questions. For each question, you have up to 5 minutes to record your video response. You cannot go back to previous questions. A practice question is available before you begin."
    },
    {
      "nodeId": "q1",
      "type": "question",
      "questionStem": "Explain the difference between supervised and unsupervised machine learning. Give a business example of each.",
      "maxFollowUps": 0,
      "timeBudgetSeconds": 300,
      "recordingRequired": true,
      "evidenceTargets": [
        {
          "id": "ev-ml-types",
          "description": "Distinguishes supervised from unsupervised learning",
          "level": "required"
        },
        {
          "id": "ev-ml-examples",
          "description": "Provides valid business examples for both types",
          "level": "required"
        }
      ],
      "questionPool": {
        "poolId": "q1-pool",
        "variants": [
          { "variantId": "q1-v1", "prompt": "Explain the difference between supervised and unsupervised machine learning. Give a business example of each." },
          { "variantId": "q1-v2", "prompt": "Compare classification and clustering algorithms. When would a business use each approach?" },
          { "variantId": "q1-v3", "prompt": "What is the role of labelled data in machine learning? Provide business scenarios where labelled data is available versus unavailable." }
        ],
        "drawCount": 1
      }
    }
    // ... additional question nodes with pools ...
  ]
}

10.15.4 Key Differences from Dialogue-Based Examples

Feature	CS301 / INFOSYS110 (Dialogue)	ConVOE (Presentation)
Interaction mode	Dialogue (multi-turn)	Presentation (one-way recording)
Follow-ups	Allowed (1–3 per node)	None (maxFollowUps: 0)
Candidate commands	repeat, clarify, pause	None (no live examiner)
Scalability	1 candidate per session	620 candidates concurrent
Question pools	Fixed questions	Randomised from pool (anti-collusion)
Grading	Sequential per session	Parallel evaluation (by question)
Backtracking	Not applicable	Explicitly forbidden
Practice session	Optional scaffolding	Built-in practice question
Reliability concern	Inter-case (follow-up variance)	Inter-case (question difficulty equivalence)
Academic integrity	Conversation fingerprint	Question rotation + time limit + video recording

10.15.5 Scalability Considerations

The ConVOE format exercises the specification’s scalability features:

Question pools (questionPool): Each question slot draws from a pool of equivalent variants, mitigating question-sharing (Bayley et al., 2024, p. 165: “students posted ConVOE questions to an online group chat”).
Cohort management: The cohort entity groups 620 concurrent sessions and enables batch grading.
Parallel evaluation: The gradingStrategy: "parallel_evaluation" ensures graders assess all candidates on Q1 before moving to Q2, maintaining consistency (Bayley et al., 2024, p. 163).
No follow-ups: The presentation interaction mode eliminates inter-case reliability concerns from dialogue variance.

Revision History

Version	Date	Changes
v0.2.0	2026-06-30	Updated examples to reflect IOA-ORM terminology and new schema fields.
v0.1.0	2026-05-06	Initial release.