Migration Plan
Status
Section titled “Status”Draft · v0.2.0 · 2026-06-30
This chapter defines an incremental, five-phase migration from the current
flowJson-based architecture to the full IOA-ORM. Each phase is
independently shippable, backward-compatible, and testable. No phase requires a
full rewrite; each builds on the previous.
Guiding principles:
- Incremental over revolutionary. Each phase delivers user-visible or engineering-visible value on its own.
- Backward-compatible by default. Existing published packages MUST continue to work. New features are additive.
- Feature-flagged. Each phase uses server-side feature flags so it can be enabled per-exam or per-tenant.
- Reversible. Any phase can be rolled back without data loss.
- Pedagogy-driven. Each phase is justified by assessment quality goals, not just engineering goals. The migration improves validity, reliability, and fairness — not just technical capability.
Phase 0 — Institutional Readiness (Parallel Workstream)
Section titled “Phase 0 — Institutional Readiness (Parallel Workstream)”Goal: Prepare the human and institutional infrastructure for AI-powered oral assessment. This phase runs in parallel with Phases 1–4 and addresses the adoption barriers that Akimov & Malin (2020) and Fenton (2025) identify as critical to successful oral exam implementation.
Theoretical grounding: Akimov & Malin (2020) report that “administering oral examinations for larger classes would present several additional challenges because it would either require multiple examiners, which in turn could raise inter-rater consistency issues” (p. 1205). Fenton (2025) explicitly recommends “a training or shadowing program with experienced instructors leading novices” and “all examiners should receive training in oral assessment procedures.” Bayley et al. (2024) found that approximately 75% of students engaged with a practice ConVOE before the real assessment, highlighting the importance of student preparation.
0.1 Examiner Training Programme
Section titled “0.1 Examiner Training Programme”| Deliverable | Description | Owner |
|---|---|---|
| Training materials | How to author evidence targets, design follow-up banks, set time budgets, and calibrate assessment standards | Assessment design team |
| Calibration exercises | Pre-scored sample exams where new examiners practice generating evidence signals and compare against ground truth | Assessment design team |
| Shadowing protocol | New examiners observe experienced examiners conducting IOAs before designing their own | Academic development |
| Bias awareness module | Training on examiner bias (Akimov & Malin, 2020: “a large number of examiners acknowledged the presence of various biases”), language sensitivity, and cultural communication styles | DEI / accessibility team |
0.2 Student Preparation
Section titled “0.2 Student Preparation”| Deliverable | Description | Owner | |---|---| | Student-facing documentation | What to expect, how to prepare, what commands are available, time structure | Assessment design team | | Practice exam | A low-stakes familiarisation exam using the specification’s scaffolding feature (see INFOSYS110 example, §10.11.8) | Assessment design team | | Accessibility assessment | Identify candidates who may need accommodations (extra time, alternative formats, language support) | Accessibility team | | Anxiety reduction resources | Information about the exam format, sample questions, and tips for managing exam anxiety (Fenton, 2025: “as much student-facing support as you can to reduce anxiety”) | Student wellbeing |
0.3 Pilot Planning
Section titled “0.3 Pilot Planning”| Deliverable | Description | Owner | |---|---| | Pilot course selection | Select 2–3 courses for initial deployment; prefer low-stakes or formative assessments first | Assessment design team | | Feedback loops | Establish candidate and examiner feedback mechanisms (surveys, focus groups) | Quality assurance | | Baseline metrics | Collect pre-migration data on student satisfaction, assessment outcomes, and marking consistency for comparison | Analytics team |
0.4 Scale Readiness (for 100+ candidates)
Section titled “0.4 Scale Readiness (for 100+ candidates)”Theoretical grounding: Bayley et al. (2024) implemented ConVOEs for 600+ students. Their key lesson: concurrent administration requires standardised question formats, parallel grading, and technology failure handling. The migration plan must account for these at the institutional level.
| Deliverable | Description | Owner | |---|---| | Capacity planning | Estimate concurrent session capacity; load test infrastructure | Platform team | | Question pool design | For large cohorts, design question pools with equivalent difficulty variants (Bayley et al., 2024) | Assessment design team | | Moderation workflow | Design human review process for AI-generated evidence signals | Assessment design + QA | | Technology failure protocol | Establish SLA for bot crashes, STT failures, and disconnections during concurrent sessions | Platform + support teams |
Phase 1 — Event Protocol & Transcript Closure
Section titled “Phase 1 — Event Protocol & Transcript Closure”Goal: Establish a reliable event stream and ensure exam transcripts are complete, structured, and persisted.
1.1 What Changes
Section titled “1.1 What Changes”| Area | Before | After |
|---|---|---|
| Event emission | Ad-hoc; some events emitted via data channel, some lost | Every key lifecycle moment emits a typed event to the event store |
bot_ready | Not consistently emitted | MUST emit when the bot session is established and ready |
node_entered | Implicit in flowJson node switch | Explicit event with nodeId, nodeType, timestamp |
node_exited | Not emitted | MUST emit before next node_entered |
transcript_delta | Partial STT results sent to UI | Formalised event with nodeId, speaker, isFinal flag |
transcript_final | Emitted but not reliably persisted | MUST be persisted server-side with nodeId, speaker, spanId |
exam_completed | Inconsistent; some sessions end without it | MUST emit exactly once, guaranteed by runtime controller |
| UI event consumption | UI parses raw data channel messages | UI consumes typed events from a standardised event contract |
| Server transcript persistence | Transcripts scattered across logs | Unified transcript store, queryable by examId + sessionId |
1.2 Why It Matters
Section titled “1.2 Why It Matters”- Without reliable events, observability is impossible — debugging failed exams requires guessing.
- Without transcript closure, the marking pipeline receives incomplete or unstructured data, producing unreliable marks.
exam_completedconsistency is prerequisite for triggering post-exam workflows (marking, analytics, candidate notification).
1.3 Engineering Tasks
Section titled “1.3 Engineering Tasks”- Define event schema (
EventProtocoltypes in §05-event-protocol.md). Implement as a TypeScript interface + JSON Schema for validation. - Instrument the Pipecat bot lifecycle:
- Emit
bot_readyat session start. - Emit
node_entered/node_exitedaround each FlowManager node transition. - Emit
transcript_delta/transcript_finalfrom STT pipeline.
- Emit
- Implement event store persistence layer:
- New
exam_eventstable (or equivalent) withexamId,sessionId,eventType,payload,timestamp. - Index on
(examId, sessionId, timestamp).
- New
- Guarantee
exam_completedemission:- Add a
finally-style hook in the runtime controller that firesexam_completedregardless of how the session ends (normal, timeout, error, disconnection).
- Add a
- Update the frontend exam room to consume typed events instead of raw data channel messages. Create an event dispatcher that maps event types to UI update functions.
- Add transcript aggregation service:
- Collects
transcript_finalevents per session. - Produces an ordered, deduplicated transcript with span IDs.
- Persists to
exam_transcriptstable.
- Collects
1.4 Risks
Section titled “1.4 Risks”| Risk | Mitigation |
|---|---|
| STT produces duplicate or overlapping final transcripts | Deduplicate by span ID; use monotonic timestamps |
exam_completed fires before all transcripts are persisted | Add a flush-and-wait step before emitting exam_completed |
| Increased event volume impacts bot latency | Event emission MUST be async (fire-and-forget to a queue); MUST NOT block the dialogue loop |
| Existing bots emit different event formats | Adapter normalises legacy events during migration window |
1.5 Testing Strategy
Section titled “1.5 Testing Strategy”- Unit tests: Schema validation for every event type. Malformed payloads MUST be rejected with clear error messages.
- Integration tests: Spin up a test bot session, verify every expected event is emitted in order and persisted.
- Contract tests: Frontend event consumer tests — verify UI renders correctly for each event type.
- Regression: Run existing exam flows; verify no change in candidate-facing behaviour.
- Chaos test: Kill the bot mid-session; verify
exam_completedstill fires (from the guaranteed hook) and transcript is recoverable.
1.6 Effect on Published Packages
Section titled “1.6 Effect on Published Packages”None. This phase adds events and persistence. It does not change the flowJson format or the candidate-facing experience. Existing published packages continue to work — they simply don’t emit the new events until re-published.
Phase 2 — Node State, Progress & Candidate Commands
Section titled “Phase 2 — Node State, Progress & Candidate Commands”Goal: Introduce runtime-managed node state, question progress tracking, and candidate command consumption.
2.1 What Changes
Section titled “2.1 What Changes”| Area | Before | After |
|---|---|---|
| Runtime node state | Bot tracks which FlowManager node is active; no richer state | Runtime controller maintains per-node state: followUpCount, timeElapsed, evidenceCovered |
| Question progress | Not tracked; LLM decides when to move on | Runtime emits node_progress events; UI can show “Question 1 of 2 — Follow-up 1/2” |
| Candidate commands | repeat, clarification, raise_hand are UI-only or handled ad-hoc by LLM prompt | Runtime intercepts candidate commands, applies policy, emits candidate_command events |
| Data channel command protocol | No standardised candidate→bot command channel | Formalised command protocol: candidate sends typed command via data channel, runtime validates and routes |
node_progress event | Doesn’t exist | New event emitted on every state change within a node |
2.2 Why It Matters
Section titled “2.2 Why It Matters”- Progress visibility: Candidates and proctors need to see where they are in the exam. Without runtime state, the UI is blind.
- Command determinism: Candidate commands currently rely on the LLM correctly interpreting intent from natural language. This is fragile. Runtime interception provides deterministic, auditable command handling.
- Follow-up counting: Without runtime tracking, the LLM can exceed the author’s intended follow-up limit. This is a fairness issue.
2.3 Engineering Tasks
Section titled “2.3 Engineering Tasks”- Implement runtime node state store:
- In-memory state object per active session.
- Schema:
{ nodeId, followUpCount, maxFollowUps, timeBudgetSeconds, timeElapsed, evidenceCovered: string[], candidateCommandsUsed: [...] }. - Emit
node_progresson every state mutation.
- Implement candidate command classifier:
- Receives STT output for candidate utterances.
- Classifies intent:
repeat,clarification,raise_hand, oranswer. - Uses a lightweight classifier (rule-based or small model) — NOT the main LLM, to avoid latency and cost.
- Implement data channel command protocol:
- Candidate UI sends:
{ type: "candidate_command", command: "repeat" }. - Runtime validates: Is this command allowed at this point? Is it within
maxPerNode? Does it cost a follow-up? - Routes to appropriate handler (re-prompt, clarification, pause timer).
- Candidate UI sends:
- Wire candidate commands to runtime state:
repeat→ re-emit the current prompt, do NOT incrementfollowUpCount.clarification→ allow LLM to clarify within guardrails, do NOT incrementfollowUpCount.raise_hand→ pausetimeBudgetSecondscountdown for configured duration.
- Update the frontend to display
node_progressdata and send candidate commands via the data channel protocol.
2.4 Risks
Section titled “2.4 Risks”| Risk | Mitigation |
|---|---|
| Command classifier misclassifies an answer as a command (or vice versa) | Confidence threshold; fallback to treating ambiguous utterances as answers. Log misclassifications for retraining. |
Candidate uses command strategically to waste time (e.g., repeated raise_hand) | maxPerNode limits enforced by runtime. Exceeded commands logged and ignored. |
| Runtime state diverges from FlowManager state | Add reconciliation check: runtime state and FlowManager node MUST agree. Emit state_mismatch alert if they diverge. |
| Latency increase from command classification | Command classifier MUST complete in <200ms. Use a fast local model or rule-based system, not the main LLM. |
2.5 Testing Strategy
Section titled “2.5 Testing Strategy”- Unit tests: Command classifier accuracy — test with a corpus of 500+ candidate utterances across all command types and edge cases.
- State machine tests: Verify
followUpCountincrements correctly, pauses work,maxPerNodeis enforced. - Integration tests: Full session with candidate commands — verify events are emitted, state is updated, and LLM responds correctly.
- Adversarial tests: Candidate sends 10
repeatcommands in a row. Verify: first 3 work (withinmaxPerNode), rest are rejected with a polite message. VerifyfollowUpCountnever increments forrepeat. - Regression: Existing exams without commands continue to work normally.
2.6 Effect on Published Packages
Section titled “2.6 Effect on Published Packages”Minimal. Published packages that don’t declare candidateCommands continue
to work unchanged. Packages that want to support commands need to be
re-published with the new candidateCommands section in the specification. This is
opt-in.
Phase 3 — Evidence Target & Evidence Ledger
Section titled “Phase 3 — Evidence Target & Evidence Ledger”Goal: Attach structured evidence targets to questions, emit
evidence_signal events during the exam, and produce a complete evidence
ledger for the marking pipeline.
3.1 What Changes
Section titled “3.1 What Changes”| Area | Before | After |
|---|---|---|
| Evidence targets in specification | Not present in flowJson | evidenceTargets array on each question node with id, description, rubric, level |
| Evidence detection | LLM judges evidence ad-hoc in its context; no structured output | LLM emits structured evidence_signal events during the exam; runtime validates and persists |
| Transcript span mapping | No link between evidence and transcript | evidence_signal includes transcriptSpanId linking to the exact transcript excerpt |
| Evidence ledger | Doesn’t exist; marking uses raw transcript | Structured ledger: per-evidence-target, with signal status, confidence, rationale, transcript excerpts |
| markingRuntime input | Raw transcript only | Structured input: evidence ledger + transcript + runtime audit |
3.2 Why It Matters
Section titled “3.2 Why It Matters”- Marking quality: Without structured evidence, the marking pipeline must re-analyse the entire transcript. This is expensive, slow, and inconsistent.
- Auditability: Evidence signals with transcript links enable human markers to verify the AI’s assessment quickly.
- Rubric alignment: Evidence targets in the specification create an explicit contract between the author’s intent and the runtime’s execution.
3.3 Engineering Tasks
Section titled “3.3 Engineering Tasks”- Extend the specification schema with
evidenceTargetson question nodes (see §02-schema.md). - Implement evidence detection in the LLM pipeline:
- After each candidate answer, the LLM evaluates which evidence targets have been addressed.
- Outputs a structured
evidence_signal(not free text). - This can be a separate LLM call (judge model) or a structured output from the main dialogue LLM.
- Implement transcript span mapping:
- When
evidence_signalis emitted, link it to the most recenttranscript_finalspan(s) that contain the relevant content. - Store
transcriptSpanIdsin the signal payload.
- When
- Implement evidence ledger persistence:
- New
exam_evidencetable keyed by(examId, sessionId, evidenceTargetId). - Upsert on each
evidence_signal— later signals can override earlier ones if confidence increases.
- New
- Build the markRuntime input assembly:
- New service that, on
exam_completed, assembles the full marking input: evidence ledger + transcript + runtime audit + specification snapshot. - Persist and make available to the marking pipeline.
- New service that, on
3.4 Risks
Section titled “3.4 Risks”| Risk | Mitigation |
|---|---|
| Evidence detection LLM hallucinates — marks evidence as “covered” when it isn’t | Use confidence threshold (e.g., 0.7); below threshold, mark as “uncertain” for human review. Always include rationale. |
| Evidence detection adds latency to each answer turn | Run evidence detection asynchronously after transcript_final; do not block the dialogue loop. Evidence signal may arrive seconds after the transcript. |
| Transcript span mapping is wrong — links evidence to the wrong excerpt | Use the most recent candidate transcript_final before the signal. Validate that the span text actually contains content relevant to the evidence target. |
| Marking pipeline doesn’t use the evidence ledger | Phase 3 delivers the data; marking pipeline integration is a separate workstream. Ensure the marking team is aligned on consuming the new input format. |
3.5 Testing Strategy
Section titled “3.5 Testing Strategy”- Unit tests: Evidence signal schema validation. Transcript span linking correctness.
- Integration tests: Full session with evidence detection — verify ledger is complete and accurate.
- Accuracy tests: Run 50+ recorded exams through the evidence detector; compare against human-annotated ground truth. Target: >85% agreement.
- Edge case tests: Candidate gives a one-word answer. Candidate gives a rambling answer that partially covers multiple evidence targets. Candidate contradicts themselves.
- Regression: Existing exams without evidence targets continue to work. The evidence ledger is simply empty for those exams.
3.6 Effect on Published Packages
Section titled “3.6 Effect on Published Packages”Opt-in additive. Existing published packages don’t have evidenceTargets
and continue to work. Packages re-published with evidenceTargets get
structured evidence collection. The marking pipeline MUST handle both: exams
with evidence ledgers (new) and exams with raw transcripts only (legacy).
Phase 4 — Hard Follow-Up & Transition Policy
Section titled “Phase 4 — Hard Follow-Up & Transition Policy”Goal: Enforce follow-up limits and transition policies at the runtime level, preventing the LLM from exceeding author-defined constraints.
4.1 What Changes
Section titled “4.1 What Changes”| Area | Before | After |
|---|---|---|
| Follow-up enforcement | Follow-up limit is a prompt instruction; LLM may exceed it | Runtime tracks followUpCount; blocks LLM from generating follow-up when limit reached |
| Transition authority | LLM decides when to move to the next node | Runtime approves all transitions; LLM proposes, runtime decides |
| Transition blocking | No mechanism to prevent LLM from jumping nodes | Runtime blocks unauthorised transitions; emits guardrail_violation |
| Transition decision log | No record of why transitions happened | transition_decision event with decision, reason, targetNodeId |
| Time budget enforcement | Time budget is a prompt hint | Runtime enforces: warns at 80%, hard-moves at 100% |
4.2 Why It Matters
Section titled “4.2 Why It Matters”- Fairness: If the LLM can exceed follow-up limits, some candidates get more chances than others. This is a serious assessment integrity issue.
- Structural integrity: The author designed a specific flow. The LLM should not be able to deviate from it. Runtime enforcement guarantees this.
- Auditability: Transition decisions are now logged with reasons. This is essential for appeals, quality assurance, and exam reviews.
4.3 Engineering Tasks
Section titled “4.3 Engineering Tasks”- Implement follow-up counter in runtime controller:
- Increment on each LLM-generated follow-up question.
- Decrement policy: NEVER (follow-ups are permanent).
- When
followUpCount >= maxFollowUps, inject a “move to next question” instruction into the LLM context instead of allowing another follow-up.
- Implement transition approval gate:
- LLM signals intent to transition (via structured output or a special token).
- Runtime checks: Is the target in
allowedTargets? Is the transition condition satisfied? - If approved: emit
transition_decisionwithdecision: "move_to_next_node". - If blocked: emit
transition_decisionwithdecision: "blocked"and re-inject the current node’s prompt.
- Implement time budget enforcement:
- Runtime tracks elapsed time per node.
- At 80% of budget: emit
time_budget_warningevent. - At 100%: emit
time_budget_exceededand force transition (peroverrunPolicy).
- Implement guardrail violation handling:
- When the LLM generates text that violates a
forbiddenrule, block it. - Emit
guardrail_violationevent. - Regenerate the response without the violation.
- When the LLM generates text that violates a
- Add
transition_decisionevent to the event protocol.
4.4 Risks
Section titled “4.4 Risks”| Risk | Mitigation |
|---|---|
| Runtime blocks a transition that the LLM correctly identified as appropriate | Transition conditions are authored; if they’re too strict, the author should adjust. Log blocked transitions for review. |
| Forced transition due to time budget feels abrupt to the candidate | The LLM is instructed to provide a graceful bridge: “We’re running short on time, so let’s move to the next question.” |
| LLM ignores the “move to next question” instruction after follow-up limit | If the LLM generates another follow-up despite the instruction, the runtime MUST block it and inject the next node’s stem directly. |
| Transition approval adds latency | Transition checks are pure in-memory logic — MUST complete in <10ms. No network calls. |
4.5 Testing Strategy
Section titled “4.5 Testing Strategy”- Unit tests: Follow-up counter: increments, caps at max, never decrements. Transition approval: allowed and blocked cases.
- State machine tests: Full state machine simulation — verify all paths through the node graph with various follow-up counts and time budgets.
- Adversarial tests: LLM prompt injection attempts — try to get the LLM to skip a question, reveal the rubric, or exceed follow-up limits. Verify runtime blocks all of these.
- Integration tests: Full session with hard transitions — verify the candidate experience is smooth even when transitions are forced.
- Regression: Existing exams continue to work. The new enforcement is
additive — it only activates for IRs that declare
transitionPolicyandmaxFollowUps.
4.6 Effect on Published Packages
Section titled “4.6 Effect on Published Packages”Backward-compatible. Existing published packages that don’t declare
transitionPolicy or maxFollowUps continue to use the current LLM-decided
transitions. Packages that declare these get runtime enforcement. This is
opt-in until Phase 5 makes it mandatory.
Phase 5 — Promote flowJson to Formal IOA-ORM
Section titled “Phase 5 — Promote flowJson to Formal IOA-ORM”Goal: Make the IOA-ORM the single source of truth for exam runtime configuration. flowJson becomes a legacy compatibility layer.
5.1 What Changes
Section titled “5.1 What Changes”| Area | Before | After |
|---|---|---|
| Source of truth | flowJson is compiled and passed to Pipecat directly | IOA-ORM is the source of truth; Pipecat config is an adapter output |
| Compilation pipeline | AssessmentPackage → flowJson → Pipecat | AssessmentPackage → IOA-ORM → (adapter) → Pipecat config |
| Versioning | No formal versioning on flowJson | irVersion field in the specification; semantic versioning; backward compatibility rules |
| Backward compatibility | N/A | Published packages with old flowJson are auto-migrated to IOA-ORM v1.0.0 on first use |
| Schema validation | Ad-hoc | JSON Schema for ExamRuntimeIR; CI validation; runtime validation on load |
| Pipecat adapter | flowJson IS the Pipecat config | Separate adapter module that compiles specification → Pipecat config; isolates Pipecat-specific concerns |
| Documentation | Scattered across code comments | Formal specification (this document suite); API docs; migration guides |
5.2 Why It Matters
Section titled “5.2 Why It Matters”- Single source of truth: Eliminates the drift between what the author intended and what the runtime executes.
- Portability: If Pipecat is replaced or supplemented, only the adapter changes. The specification and the runtime controller are unaffected.
- Ecosystem: Other tools (analytics, reporting, quality assurance) can consume the specification directly, without understanding Pipecat internals.
- Governance: Versioned specification enables controlled evolution, deprecation policies, and migration tooling.
5.3 Engineering Tasks
Section titled “5.3 Engineering Tasks”- Finalise the specification schema (all fields from Phases 1–4, plus metadata, versioning, and any remaining gaps).
- Implement the Pipecat adapter module:
- Input:
ExamRuntimeIR. - Output: Pipecat FlowManager config.
- The adapter MUST NOT add domain logic — it is a pure translation layer.
- Input:
- Implement IR versioning and migration:
irVersionfollows semver.- Migration tool converts old flowJson → IOA-ORM v1.0.0.
- Breaking changes increment major version; migration tool provided.
- Implement schema validation:
- JSON Schema published and versioned.
- CI: validate specification on build.
- Runtime: validate specification on load; reject invalid specifications with clear errors.
- Auto-migrate existing published packages:
- On first access, detect old flowJson format.
- Convert to IOA-ORM v1.0.0 and persist.
- Original flowJson preserved for rollback.
- Update the Assessment Studio to compile to IOA-ORM instead of flowJson.
- Deprecate flowJson:
- Add deprecation warnings to flowJson code paths.
- Set a sunset date (e.g., 6 months after Phase 5 ships).
- After sunset, flowJson code paths are removed.
5.4 Risks
Section titled “5.4 Risks”| Risk | Mitigation |
|---|---|
| Auto-migration introduces bugs in existing exams | Run migration in dry-run mode first; compare generated specification against expected output. Validate 100% of existing published packages before enabling auto-migration. |
| Breaking change in specification forces all packages to be re-published | Semver policy: major version bump for breaking changes. Migration tool provided. Old major versions supported for at least 2 major versions. |
| Pipecat adapter introduces bugs | Adapter is a pure function — easily testable. Comprehensive test suite mapping specification → expected Pipecat config for every node type. |
| Team resists the migration because flowJson “works fine” | Phase 5 is the culmination — by this point, the team has already seen the value of events, evidence, and runtime control in Phases 1–4. Phase 5 just formalises it. |
| External integrations depend on flowJson format | Provide a compatibility shim that produces flowJson from specification. Deprecate the shim on the same timeline. |
5.5 Testing Strategy
Section titled “5.5 Testing Strategy”- Migration tests: Run migration on every existing published package. Verify: specification is valid, Pipecat adapter output matches original flowJson (where applicable), no candidate-facing changes.
- Schema validation tests: Valid IRs pass; invalid IRs are rejected with specific error messages.
- Adapter tests: For every node type, verify adapter output matches expected Pipecat config.
- End-to-end tests: Full exam session using IOA-ORM as source of truth. Verify: events, evidence, commands, transitions, marking input — all correct.
- Performance tests: Specification compilation + adapter MUST complete in <500ms for a typical exam.
- Regression: All existing tests continue to pass.
5.6 Effect on Published Packages
Section titled “5.6 Effect on Published Packages”This is the migration phase. All existing published packages are affected: they are auto-migrated from flowJson to IOA-ORM v1.0.0. After migration, they continue to work exactly as before — but now through the specification pipeline.
Post-migration:
- New packages MUST be published as IOA-ORM.
- Old packages continue to work via auto-migration.
- flowJson is deprecated with a sunset date.
Phase Summary
Section titled “Phase Summary”| Phase | Duration Estimate | Key Deliverable | Breaking? |
|---|---|---|---|
| 0 — Institutional Readiness | 2–3 weeks (parallel) | Examiner training, student prep, pilot planning | No |
| 1 — Event Protocol & Transcript Closure | 3–4 weeks | Reliable event stream + persisted transcripts | No |
| 2 — Node State & Candidate Commands | 3–4 weeks | Runtime state + candidate command handling | No (opt-in) |
| 3 — Evidence Target & Ledger | 4–5 weeks | Structured evidence for marking | No (opt-in) |
| 4 — Hard Follow-Up & Transition Policy | 3–4 weeks | Runtime-enforced constraints | No (opt-in) |
| 5 — Promote to IOA-ORM | 4–6 weeks | IOA-ORM as source of truth + migration | Auto-migration for all packages |
Total estimated duration: 19–25 weeks (5–6 months), assuming one team + parallel institutional readiness workstream.
Phases 1–4 can partially overlap — they build on each other but each delivers independent value. Phase 0 runs in parallel with Phases 1–3. Phase 5 depends on all previous phases being stable.
Dependency Graph
Section titled “Dependency Graph”Phase 0 ──────────────────────────────▶ (runs in parallel with Phases 1–3)
│
Phase 1 ──▶ Phase 2 ──▶ Phase 4
│ ▲
└──────▶ Phase 3 ─────┘
│
└──▶ Phase 5
- Phase 0 runs in parallel with Phases 1–3. It must complete before Phase 5 (which involves all published packages) but does not block engineering phases.
- Phase 2 depends on Phase 1 (events are needed for state tracking).
- Phase 3 depends on Phase 1 (transcript spans are needed for evidence linking).
- Phase 4 depends on Phases 2 and 3 (follow-up counting needs node state; transition conditions may reference evidence coverage).
- Phase 5 depends on all previous phases being stable and shipped.
- Phase 5 also depends on Phase 0: examiner training and student preparation must be complete before mass migration.
Revision History
Section titled “Revision History”| Version | Date | Changes |
|---|---|---|
| v0.2.0 | 2026-06-30 | Updated migration plan for IOA-ORM naming. Adjusted phase dependencies. |
| v0.1.0 | 2026-05-06 | Initial release. |