Executive Summary

Status

Draft · v0.2.0 · 2026-06-30

1. Executive Summary

The Rising Need for Interactive Oral Assessment

Oral examinations assess what written tests cannot: the ability to reason under questioning, defend a position, respond to probing follow-ups, and demonstrate competence through live interaction. Sotiriadou et al. (2020) define this as the “interactive oral” — “a form of assessment asking students to perform real-world tasks to demonstrate meaningful application of necessary knowledge and skills.” Unlike written exams, interactive oral assessments (IOA) probe higher-order thinking: Bloom’s (1956) levels of Analyze, Evaluate, and Create — where candidates must defend, justify, and produce, not merely recall.

The need for IOA is growing. Academic integrity concerns make written exams increasingly unreliable indicators of student competence (Fenton, 2025). Professional accreditation bodies demand assessment of communication, critical thinking, and interpersonal skills — competencies that only live interaction can demonstrate. And as generative AI tools become capable of passing written assessments at the Remember and Understand levels (Fenton, 2025), the case for oral examination as a complement — or alternative — to written assessment strengthens.

But IOA has historically been limited by scale. A human examiner for every candidate is expensive, inconsistent across examiners, and impractical for cohorts of hundreds. This creates a need for systematic, machine-executable oral assessment — where the exam’s structure, evidence capture, and policy enforcement are formally specified and executed by a runtime system, whether that runtime is a human examiner following a script, a rule-based machine, or an AI-powered voice agent.

Kōrero (korero.thesteder.com) is one such platform — an AI-powered system where lecturers design interactive exam flows that an AI examiner conducts with candidates via real-time voice. But Kōrero is one instantiation of a broader pattern. The problem is general: any system that executes interactive oral assessments needs a formal specification that bridges assessment design with runtime execution — regardless of whether the executor is human, machine, or AI. is general: any system that executes interactive oral assessments needs a formal specification that bridges assessment design with runtime execution — regardless of whether the executor is human, machine, or AI.

The Gap: Assessment Theory Meets Runtime Execution

The oral assessment literature defines what makes an exam reliable, valid, and fair. Joughin (1998) identifies six dimensions that shape assessment quality: content type, interaction mode, authenticity, structure, examiner configuration, and degree of orality. Akimov and Malin (2020) formalize the validity/reliability/fairness matrix. Bayley et al. (2024) demonstrate scalable oral exam administration for 600+ students. Fenton (2025) defines interactive oral assessment (IOA) components including prompting taxonomy, scaffolding, and moderation.

But translating these theoretical requirements into a running system is hard. Current approaches fall into two categories, both insufficient:

Hard-coded runtime logic. The exam’s behavior is embedded directly in application code. This works for one exam but is opaque, non-portable, and impossible to validate at compile time. Every new exam requires reimplementation. Assessment-theoretic properties (e.g., “this exam uses structured dialogue with moderate openness”) exist only as implicit assumptions in code, not as inspectable, versioned artifacts.
Generic workflow engines. Dialogue graphs, state machines, or workflow DSLs can describe conversational flow — but they lack assessment-specific concepts. They have no notion of evidence targets, candidate commands, completion policies, scaffolding budgets, or moderation workflows. The runtime must improvise, and each improvisation is a potential validity threat.

The Vision: A Formal, Interoperable Exam Specification

This specification proposes a new kind of artifact: a formal, machine-processable, platform-independent specification for interactive oral assessments. Drawing from the semantic web tradition — where ontologies provide “a formal, explicit specification of a shared conceptualization” (Gruber, 1993) — this specification defines a shared vocabulary and formal semantics for what an oral assessment is, independent of how any particular system executes it.

The specification is implementation-agnostic. An exam specified in this reference model could be executed by:

A human examiner following a structured script with policy enforcement
A rule-based machine that drives a branching dialogue with deterministic transitions
An AI-powered voice agent that generates natural follow-ups within bounded policies

When the executor is an AI agent, the specification provides additional primitives — agent boundaries, evidence provenance, and runtime policy enforcement — that treat the generative model as a first-class component whose behavior must be formally bounded and auditable. But these AI-specific constructs are extensions, not prerequisites. The core specification applies to any execution model.

The key properties of this specification are:

Formal semantics. Every construct (evidence target, completion policy, transition rule) has a precise, machine-enforceable meaning — not just a human-readable description. The specification is grounded in oral assessment theory (Joughin, 1998; Akimov & Malin, 2020), encoding theoretical dimensions as executable parameters.
Semantic interoperability. The specification provides a shared vocabulary that bridges the conceptual models of assessment designers (rubric criteria, evidence targets, interaction patterns) and runtime engineers (nodes, edges, state transitions). These two communities currently lack a common language; the specification is that common language.
Platform independence. The IR is a compilation source, not an execution config. It compiles to platform-specific formats for any runtime engine — making exam specifications portable across systems and preserving them beyond any single platform’s lifecycle.
Versionability and auditability. Each published exam is a versioned, immutable artifact with a stable identity, changelog, and structural diff — enabling inspection, regression analysis, and regulatory audit.
Structured evidence capture. The specification defines structured evidence capture during live exams, with provenance tracking and confidence scoring — not post-hoc transcript analysis. Evidence is a first-class output, not a byproduct.

The Gap

Despite the growing adoption of interactive oral assessment platforms, no existing specification provides these properties. Assessment designers think in terms of rubric criteria, evidence targets, and interaction patterns. Runtime engineers think in terms of nodes, edges, and state transitions. These two communities do not speak the same language.

Existing assessment standards (QTI, xAPI, IMS Caliper) were designed for machine-graded written assessments — they cannot express the runtime behavior of an interactive oral exam. Existing dialogue management formalisms (state machines, workflow DSLs) lack assessment-specific concepts: evidence targets, candidate commands, completion policies, scaffolding budgets, and moderation workflows. The result is that every IOA platform must invent its own ad-hoc specification, its own evidence model, and its own policy rules — with no interoperability, no formal validation, and no shared vocabulary.

What This Artifact Is

The Interactive Oral Assessment Ontology and Reference Model is a design science artifact that formalizes the core concepts, relationships, system responsibilities, evidence semantics, runtime policies, and governance boundaries of interactive oral assessment systems. Its machine-processable manifestation is the Interactive Oral Assessment Executable Specification, represented by the ExamRuntimePackage. In the engineering pipeline, this package functions as an intermediate representation between authoring tools, runtime controllers, execution adapters, and marking systems.

The IOA-ORM has four complementary roles:

Domain ontology — it defines the core vocabulary and semantics of interactive oral assessment, including evidence targets, evidence signals, candidate commands, assessment profiles, completion policies, moderation policies, runtime events, and agent boundaries.
Reference model — it defines the reusable system abstraction for IOA platforms, including authoring tools, executable specification packages, runtime controllers, voice runtimes, event stores, evidence ledgers, marking runtimes, and moderation workflows.
Executable specification — it provides a machine-processable, versioned package that encodes exam structure, policies, evidence requirements, runtime semantics, validation constraints, and audit requirements.
Intermediate representation — within the engineering pipeline, the executable specification acts as an intermediate representation between authoring tools, runtime engines, policy enforcement layers, and marking systems.

Note: We use “ontology-grounded” rather than simply “ontology” because the artifact defines a shared vocabulary and formal semantics grounded in assessment theory, but does not currently provide OWL/RDF axioms or description-logic reasoning. The term acknowledges the ontological contribution without over-claiming a full semantic-web implementation.

The canonical package produced by this artifact is the ExamRuntimePackage — a published, versioned, machine-readable specification of an oral assessment. The artifact is not tied to any specific platform. Kōrero is one consumer; any system that conducts interactive oral assessments could adopt this as its canonical exam specification.

Layered Artifact Model

This artifact is organized into four layers:

┌─────────────────────────────────────────────────────────────┐
│  Domain Ontology — shared vocabulary and semantics          │
│  (AssessmentProfile, EvidenceTarget, CandidateCommand, …)   │
├─────────────────────────────────────────────────────────────┤
│  Reference Model — reusable system abstraction              │
│  (Authoring → IR → Runtime → Evidence → Marking → Audit)   │
├─────────────────────────────────────────────────────────────┤
│  Executable Specification — machine-readable package        │
│  (ExamRuntimePackage, schema, validation rules)             │
├─────────────────────────────────────────────────────────────┤
│  Intermediate Representation — engineering pipeline role    │
│  (Authoring Model → ExamRuntimePackage → Runtime Config)   │
└─────────────────────────────────────────────────────────────┘

Design Science Contribution

Following Design Science Research (March & Smith, 1995; Gregor & Hevner, 2013), this artifact contributes at multiple levels:

Artifact Component	DSR Artifact Type	IOA-ORM Layer
`EvidenceTarget`, `EvidenceSignal`, `CandidateCommand`, `AssessmentProfile`, `RuntimeEvent`	Constructs	IOA Domain Ontology
`ExamRuntimePackage`, object model, architecture, component relationships	Model	IOA Reference Model
Validation rules, transition rules, policy evaluation, recovery procedures, compilation mappings	Method	Specification and Validation Method
Kōrero implementation, runtime adapter, controller, evidence ledger integration	Instantiation	Platform Instantiation

Why This Specification Is Necessary

An oral assessment is not a chatbot conversation. It has structural requirements that generic dialogue systems cannot express:

Assessment structure must be enforceable. An exam has a defined sequence of sections, each with time budgets, completion criteria, and transition rules. These are hard constraints — not suggestions to the AI. A runtime controller must enforce them deterministically, regardless of what the generative model produces.

Evidence must be captured during the exam, not derived after. When a candidate demonstrates competence (or fails to), the system must record structured evidence in real time — not rely on post-hoc transcript analysis. A transcript shows what was said; an evidence ledger records what was demonstrated.

The AI examiner must be bounded. An AI examiner needs creative freedom to generate natural follow-ups, handle unexpected responses, and adapt to candidate behavior. But it must not skip exam sections, reveal rubric criteria, score candidates directly, or ignore candidate commands (e.g., “can you repeat that?”). Autonomy must exist within explicit boundaries.

Assessment properties must be inspectable. An exam that claims to assess “interpersonal competence through structured dialogue” (Joughin’s interaction dimension) should have that claim encoded in its specification — not buried in code. The runtime should be able to verify that the exam actually operates as designed.

Fairness and moderation must be built in. At scale, AI-conducted exams need human moderation workflows, calibration profiles, and fairness auditing across demographic dimensions. These cannot be afterthoughts — they must be first-class properties of the exam specification.

Addressed Gaps

This specification addresses the following gaps in current practice:

Gap	What This Specification Provides
No formal specification for AI-conducted oral assessments	A versioned, machine-readable exam specification with 26 schema sections covering structure, policies, evidence, and runtime behavior
Assessment theory disconnected from runtime execution	`AssessmentProfile` encoding Joughin’s (1998) six dimensions as first-class runtime parameters
No structured evidence capture during live exams	`EvidenceLedger` with real-time `EvidenceSignal` emission, provenance tracking, and confidence scoring
AI examiner behavior not formally bounded	Three-layer agent boundary model with allowed/forbidden action catalog and runtime enforcement
No compile-time validation of exam designs	117 validation rules across 10 categories, checking structural, semantic, and assessment-theoretic consistency
Candidate commands not consumed by runtime	`CandidateCommand` as runtime primitives (repeat, clarification, pause, raise-hand) with processing rules
No event contract for downstream consumers	Typed event protocol with 20+ event types, delivery guarantees, and audit trail
Exams not versioned or diffable	Dual versioning scheme (schema version + assessment-theoretic version) with published package immutability
Recovery from anomalies not standardized	`RecoveryPolicy` with categorized strategies for silence, unclear answers, off-topic responses, anxiety, and technical failures
Moderation and fairness not built into exam spec	`ModerationPolicy`, `CalibrationProfile`, and `FairnessAudit` as first-class constructs

Where AI Examiner Autonomy Fits

The AI examiner MUST be autonomous within a bounded creative space:

Follow-up generation. Given a candidate response, the examiner SHOULD generate natural, contextually appropriate follow-ups — but MUST respect maxFollowUps and forbiddenFollowUpPatterns from the runtime policy.
Evidence judgment. The examiner MAY assess whether a candidate response satisfies an EvidenceTarget, producing an EvidenceSignal — but MUST NOT override explicit rubric thresholds or fabricate signals.
Repair and recovery. The examiner SHOULD handle silence, unclear answers, off-topic responses, and candidate anxiety with natural language repair — but MUST follow the prescribed RecoveryPolicy sequence, not invent ad-hoc interventions.
Bridging. The examiner MAY generate natural transitions between nodes — but MUST NOT skip nodes, reorder the exam structure, or jump to topics not defined in the graph.

Autonomy lives inside nodes, bounded by policies. The runtime controller enforces boundaries at node entry, during turns, and at transitions.

What the Runtime Controller Must Enforce

The runtime controller is the policy enforcement layer between the AI examiner’s generative freedom and the exam’s structural integrity. It MUST:

Gate node transitions. No transition occurs without evaluating the CompletionPolicy of the current node and the TransitionPolicy of the target edge.
Count and cap follow-ups. Every follow-up increments a counter. When maxFollowUps is reached, the controller forces transition — not the LLM.
Enforce time budgets. Per-node and global time limits are hard constraints. The controller MUST force-transition or terminate when budgets expire.
Consume candidate commands. Repeat, clarification, raise-hand, pause — these are runtime primitives, not UI decorations. The controller MUST process them and inject appropriate responses.
Persist the evidence ledger. Every EvidenceSignal produced during the exam MUST be written to the ledger before the exam can complete. The ledger is not a transcript byproduct — it is a first-class output.
Emit structured events. Every state change (node entered, turn completed, evidence collected, command processed, policy violation) MUST produce a RuntimeEvent for the event store.
Enforce the agent boundary. The controller MUST reject any examiner action that violates AllowedAction / ForbiddenAction policies, logging the violation as an event.

Why Transcript Alone Is Not Enough

A transcript records what was said. An oral assessment requires recording what was demonstrated. The gap:

A transcript shows “Candidate discussed photosynthesis for 3 minutes.” The evidence ledger records: EvidenceSignal { targetId: "photosynthesis-mechanism", confidence: 0.85, source: "ai-judgment", turnRange: [12, 15] }.
A transcript cannot distinguish between a candidate who gave one brilliant answer and one who needed five follow-ups to reach the same conclusion. The runtime state (follow-up count, recovery attempts) carries assessment-critical information.
A transcript is flat. The exam structure (which node, which rubric criterion, which time budget) is lost without the runtime context.

Transcript is necessary but insufficient. The evidence ledger, runtime state, and event log together form the complete marking input.

Why the Evidence Ledger Should Be First-Class

The evidence ledger is not a post-processing step over the transcript. It is a structured, real-time, authoritative record of assessment evidence:

Signals are emitted during the exam, not derived after. The AI examiner produces EvidenceSignal objects as it judges candidate responses. These are written to the ledger immediately.
Signals carry provenance. Each signal records whether it came from AI judgment, explicit rubric match, candidate self-report, or external trigger.
Signals are linked to structure. Each signal references an EvidenceTarget defined in the exam specification, connecting evidence to rubric criteria.
The marking runtime reads the ledger, not the transcript. The marking pipeline consumes structured signals with confidence scores and turn references — not raw STT output.

When the evidence ledger is first-class, the marking pipeline becomes deterministic, auditable, and separable from the conversational runtime.

2. Theoretical Grounding

This specification is grounded in the oral assessment literature. Its design decisions are informed by four key works:

Paper	Key Insight	Design Impact
Joughin (1998)	Six dimensions of oral assessment: content type, interaction, authenticity, structure, examiners, orality	`AssessmentProfile` on `ExamRuntimePackage`
Akimov & Malin (2020)	Validity/reliability/fairness matrix. Recording + moderation for reliability. Question banking for inter-case reliability.	`ModerationPolicy`, `QuestionPool`, `CalibrationProfile`
Bayley et al. (2024)	ConVOE model for 600+ students: parallel administration, batch grading, practice sessions.	`expectedCandidateCount`, `QuestionPool.allowReuseAcrossConcurrentSessions`
Fenton (2025)	IOA components. Prompting taxonomy. Formative vs. summative. Examiner training. Communication skills.	`PromptingLevel`, `assessmentPurpose`, `scaffoldingBudget`, `identity_check` node
Bloom (1956)	Six cognitive levels: Remember → Understand → Apply → Analyze → Evaluate → Create. AI struggles at higher levels.	`BloomLevel` on `EvidenceTarget`; `cognitiveEscalationStrategy` on `FollowUpPolicy`

The inclusion of Bloom’s Taxonomy as a design parameter addresses a key argument for AI-era oral assessment: generative AI tools perform well at the lower levels of Bloom’s taxonomy (Remember, Understand) but struggle at the Create level and at making arguments built on theoretical frameworks (Fenton, 2025). By encoding cognitive levels on evidence targets, the specification enables validation that an exam tests the intended range of cognitive demands — and enables the AI examiner to escalate follow-up probing toward higher-order thinking.

The specification’s most important theoretical move is encoding Joughin’s six dimensions as the AssessmentProfile — a first-class property of the exam package. These dimensions are not metadata decorations; they are design parameters that constrain runtime behavior, inform evidence interpretation, and support validity arguments. An exam that declares interactionMode: "structured_dialogue" and structureProfile.opennessScore: 0.2 is making a claim about its reliability profile that the runtime can verify.

The evidence model separates collection from scoring (the AI examiner proposes signals; the marking runtime assigns marks). This addresses Akimov & Malin’s (2020) concern about intra-rater reliability: the AI’s judgments are recorded but not final — human moderation can override them.

3. What This Artifact Is (Technical Summary)

The IOA-ORM is the canonical, versioned, executable specification of a published oral assessment. It is:

The single source of truth for an exam’s structure, policies, evidence targets, and runtime behavior.
A compilation target from the authoring studio’s high-level exam model.
A compilation source for the runtime controller configuration, the execution adapter, and the marking runtime configuration.
A versioned artifact with a stable identity, changelog, and diffability between versions.

4. What This Artifact Is NOT

This artifact is NOT…	Because…
A UI schema	It does not describe frontend layout, styling, or component tree. The frontend consumes runtime events and state — it does not render the specification.
An execution engine config	The specification compiles to execution-specific configurations. It carries richer semantics (policies, evidence, constraints) that execution engines typically cannot express.
A prompt template	The AI examiner’s system prompt is derived from this specification at runtime. The specification defines what the examiner must do; the prompt describes how to speak.
A marking rubric	Rubric criteria inform `EvidenceTarget` definitions, but the specification is the runtime executable spec, not the scoring model.
A chatbot workflow	Generic dialogue graphs lack assessment-specific concepts: evidence targets, candidate commands, completion policies, time budgets, recovery strategies.

5. Design Goals

#	Goal	Description
G1	Authoring-friendly	IR is a natural compilation target from the authoring studio’s exam flow model. No manual authoring SHOULD be required.
G2	Runtime-controllable	Hard constraints on node progression, follow-ups, transitions, time budgets, candidate commands, and evidence capture. Policies are machine-enforceable.
G3	Agentic but bounded	The AI examiner has creative freedom inside nodes — but policies, guardrails, and the runtime controller enforce structural boundaries.
G4	Observable and auditable	Every significant state change produces a structured `RuntimeEvent`. The event log is the audit trail.
G5	Marking-ready	The evidence ledger provides structured, linked, confidence-scored signals to the marking runtime — not raw transcript.
G6	Execution-agnostic	The IR compiles to execution-specific configurations. It is not tied to any particular runtime engine or voice pipeline.
G7	Versioned and diffable	Each published exam has a stable specification version. Changes between versions are inspectable.
G8	Assessment-theoretically grounded	The specification encodes Joughin’s (1998) six dimensions as design parameters. Design decisions are traceable to the assessment theory knowledge base.
G9	Validity-aware	The specification supports structured validity claims (face, content, construct, concurrent), moderation workflows, and calibration profiles.
G10	Fairness-auditable	The specification supports fairness auditing across demographic dimensions. The evidence model captures enough data for post-hoc disparity analysis.

6. Non-Goals

#	Non-Goal	Rationale
NG1	Replace the execution engine	The runtime handles real-time voice pipeline (STT, LLM, TTS). The specification is the domain spec; the runtime is the execution engine.
NG2	Define UI components	The frontend is a consumer of runtime events, not a renderer of the specification.
NG3	Define scoring algorithms	The specification provides evidence; scoring logic lives in the marking runtime.
NG4	Support non-oral assessments	This specification is designed for interactive oral exams. Written, MCQ, or portfolio assessments have different runtime semantics.
NG5	Replace the authoring studio	The authoring studio is the human-facing tool. The specification is the machine-facing spec it produces.
NG6	Define session management at scale	The specification defines exam structure; session orchestration, batch processing, and cohort management are platform/runtime concerns.
NG7	Define signal processing pipelines	Paralinguistic analysis (prosody, speaking rate, pitch) is a runtime/STT concern, not a specification concern. The specification captures assessment-level semantics, not acoustic features.

7. High-Level Architecture

┌─────────────────────────────────────────────────────────────────┐
│                     AUTHORING STUDIO                            │
│  Lecturers design exam flows, define rubrics, set policies      │
│                                                                 │
│  Exam Flow Model ──compile──► ExamRuntimePackage (IR)           │
└──────────────────────────────┬──────────────────────────────────┘
                               │
                    ┌──────────▼──────────┐
                    │  IOA-ORM                                       │
                    │  (canonical spec)   │
                    │  versioned, stable  │
                    └──┬──────┬──────┬───┘
                       │      │      │
          ┌────────────▼┐  ┌─▼──────▼──────────────┐
          │  Execution   │  │  Runtime Controller    │
          │  Adapter     │  │  (policy enforcement)  │
          │              │  │                        │
          │ Compiles IR  │  │ Enforces: transitions, │
          │ to engine    │  │ follow-up caps, time,  │
          │ config +     │  │ commands, evidence     │
          │ node config  │  │ writes, agent boundary │
          └──────┬───────┘  └────────┬───────────────┘
                 │                   │
    ┌────────────▼───────────────────▼────────────┐
    │           REAL-TIME VOICE RUNTIME            │
    │  STT · LLM · TTS · Voice Pipeline           │
    │  (e.g., Pipecat + LiveKit, or equivalent)   │
    └────────────┬───────────────────┬────────────┘
                 │                   │
    ┌────────────▼────────┐  ┌──────▼──────────────┐
    │  Event Store        │  │  Evidence Ledger     │
    │  (RuntimeEvents)    │  │  (EvidenceSignals)   │
    │                     │  │                      │
    │  Audit trail,       │  │  Structured evidence │
    │  analytics, replay  │  │  for marking runtime │
    └─────────────────────┘  └──────────────────────┘
                                      │
                              ┌───────▼───────────┐
                              │  Marking Runtime   │
                              │  (reads ledger)    │
                              │  produces scores   │
                              └───────────────────┘

Relationship Summary

Component	Reads From	Writes To	Role
Authoring Studio	Lecturer input	ExamRuntimePackage	Human-facing design tool
IOA-ORM	—	—	Canonical versioned spec
Runtime Controller	ExamRuntimePackage, RuntimeState	RuntimeState, EventStore, EvidenceLedger	Policy enforcement engine
Execution Adapter	ExamRuntimePackage	Engine-specific config, node config	IR → execution config compiler
Voice Runtime	Engine config, node config	Transcript, audio	Real-time voice pipeline
Event Store	RuntimeEvent stream	Persisted event log	Audit, analytics, replay
Evidence Ledger	EvidenceSignal stream	Persisted evidence records	Marking input
Marking Runtime	EvidenceLedger, Transcript	Scores, reports	Assessment outcomes
Frontend Exam Room	EventStore, RuntimeState (via data channel)	CandidateCommands	Candidate interface

8. Document Map

Document	Content
`00-overview.md`	This file. Purpose, gaps addressed, theoretical grounding, architecture.
`01-concepts.md`	Theoretical foundations, glossary, domain entities, conceptual object model.
`02-schema.md`	TypeScript interfaces for all core objects (26 sections).
`03-runtime-semantics.md`	State machine, transition rules, policy evaluation.
`04-agent-boundary.md`	Allowed/forbidden actions, guardrail enforcement.
`05-event-protocol.md`	Event types, payloads, delivery guarantees.
`06-evidence-ledger.md`	Signal lifecycle, ledger schema, marking integration.
`07-pipecat-adapter.md`	Compilation rules, FlowManager mapping, limitations.
`08-validation-rules.md`	Compile-time validation of specification packages.
`09-versioning.md`	Version scheme, migration, compatibility.
`10-examples.md`	Complete worked examples.
`11-migration-plan.md`	Incremental migration from existing runtime configs.
`12-testing-strategy.md`	Unit, integration, simulation testing.
`13-open-questions.md`	Unresolved design decisions.

Revision History

Version	Date	Changes
v0.2.0	2026-06-30	Reframed as IOA-ORM. Added 4-layer artifact model, DSR contribution table, formal definition. Replaced ‘Exam Runtime IR’ with ‘IOA-ORM’. Added IOA-centric framing (implementation-agnostic). Added Bloom’s Taxonomy rationale.
v0.1.0	2026-05-06	Initial release.