Executive Summary
Status
Section titled “Status”Draft · v0.2.0 · 2026-06-30
1. Executive Summary
Section titled “1. Executive Summary”The Rising Need for Interactive Oral Assessment
Section titled “The Rising Need for Interactive Oral Assessment”Oral examinations assess what written tests cannot: the ability to reason under questioning, defend a position, respond to probing follow-ups, and demonstrate competence through live interaction. Sotiriadou et al. (2020) define this as the “interactive oral” — “a form of assessment asking students to perform real-world tasks to demonstrate meaningful application of necessary knowledge and skills.” Unlike written exams, interactive oral assessments (IOA) probe higher-order thinking: Bloom’s (1956) levels of Analyze, Evaluate, and Create — where candidates must defend, justify, and produce, not merely recall.
The need for IOA is growing. Academic integrity concerns make written exams increasingly unreliable indicators of student competence (Fenton, 2025). Professional accreditation bodies demand assessment of communication, critical thinking, and interpersonal skills — competencies that only live interaction can demonstrate. And as generative AI tools become capable of passing written assessments at the Remember and Understand levels (Fenton, 2025), the case for oral examination as a complement — or alternative — to written assessment strengthens.
But IOA has historically been limited by scale. A human examiner for every candidate is expensive, inconsistent across examiners, and impractical for cohorts of hundreds. This creates a need for systematic, machine-executable oral assessment — where the exam’s structure, evidence capture, and policy enforcement are formally specified and executed by a runtime system, whether that runtime is a human examiner following a script, a rule-based machine, or an AI-powered voice agent.
Kōrero (korero.thesteder.com) is one such platform — an AI-powered system where lecturers design interactive exam flows that an AI examiner conducts with candidates via real-time voice. But Kōrero is one instantiation of a broader pattern. The problem is general: any system that executes interactive oral assessments needs a formal specification that bridges assessment design with runtime execution — regardless of whether the executor is human, machine, or AI. is general: any system that executes interactive oral assessments needs a formal specification that bridges assessment design with runtime execution — regardless of whether the executor is human, machine, or AI.
The Gap: Assessment Theory Meets Runtime Execution
Section titled “The Gap: Assessment Theory Meets Runtime Execution”The oral assessment literature defines what makes an exam reliable, valid, and fair. Joughin (1998) identifies six dimensions that shape assessment quality: content type, interaction mode, authenticity, structure, examiner configuration, and degree of orality. Akimov and Malin (2020) formalize the validity/reliability/fairness matrix. Bayley et al. (2024) demonstrate scalable oral exam administration for 600+ students. Fenton (2025) defines interactive oral assessment (IOA) components including prompting taxonomy, scaffolding, and moderation.
But translating these theoretical requirements into a running system is hard. Current approaches fall into two categories, both insufficient:
-
Hard-coded runtime logic. The exam’s behavior is embedded directly in application code. This works for one exam but is opaque, non-portable, and impossible to validate at compile time. Every new exam requires reimplementation. Assessment-theoretic properties (e.g., “this exam uses structured dialogue with moderate openness”) exist only as implicit assumptions in code, not as inspectable, versioned artifacts.
-
Generic workflow engines. Dialogue graphs, state machines, or workflow DSLs can describe conversational flow — but they lack assessment-specific concepts. They have no notion of evidence targets, candidate commands, completion policies, scaffolding budgets, or moderation workflows. The runtime must improvise, and each improvisation is a potential validity threat.
The Vision: A Formal, Interoperable Exam Specification
Section titled “The Vision: A Formal, Interoperable Exam Specification”This specification proposes a new kind of artifact: a formal, machine-processable, platform-independent specification for interactive oral assessments. Drawing from the semantic web tradition — where ontologies provide “a formal, explicit specification of a shared conceptualization” (Gruber, 1993) — this specification defines a shared vocabulary and formal semantics for what an oral assessment is, independent of how any particular system executes it.
The specification is implementation-agnostic. An exam specified in this reference model could be executed by:
- A human examiner following a structured script with policy enforcement
- A rule-based machine that drives a branching dialogue with deterministic transitions
- An AI-powered voice agent that generates natural follow-ups within bounded policies
When the executor is an AI agent, the specification provides additional primitives — agent boundaries, evidence provenance, and runtime policy enforcement — that treat the generative model as a first-class component whose behavior must be formally bounded and auditable. But these AI-specific constructs are extensions, not prerequisites. The core specification applies to any execution model.
The key properties of this specification are:
- Formal semantics. Every construct (evidence target, completion policy, transition rule) has a precise, machine-enforceable meaning — not just a human-readable description. The specification is grounded in oral assessment theory (Joughin, 1998; Akimov & Malin, 2020), encoding theoretical dimensions as executable parameters.
- Semantic interoperability. The specification provides a shared vocabulary that bridges the conceptual models of assessment designers (rubric criteria, evidence targets, interaction patterns) and runtime engineers (nodes, edges, state transitions). These two communities currently lack a common language; the specification is that common language.
- Platform independence. The IR is a compilation source, not an execution config. It compiles to platform-specific formats for any runtime engine — making exam specifications portable across systems and preserving them beyond any single platform’s lifecycle.
- Versionability and auditability. Each published exam is a versioned, immutable artifact with a stable identity, changelog, and structural diff — enabling inspection, regression analysis, and regulatory audit.
- Structured evidence capture. The specification defines structured evidence capture during live exams, with provenance tracking and confidence scoring — not post-hoc transcript analysis. Evidence is a first-class output, not a byproduct.
The Gap
Section titled “The Gap”Despite the growing adoption of interactive oral assessment platforms, no existing specification provides these properties. Assessment designers think in terms of rubric criteria, evidence targets, and interaction patterns. Runtime engineers think in terms of nodes, edges, and state transitions. These two communities do not speak the same language.
Existing assessment standards (QTI, xAPI, IMS Caliper) were designed for machine-graded written assessments — they cannot express the runtime behavior of an interactive oral exam. Existing dialogue management formalisms (state machines, workflow DSLs) lack assessment-specific concepts: evidence targets, candidate commands, completion policies, scaffolding budgets, and moderation workflows. The result is that every IOA platform must invent its own ad-hoc specification, its own evidence model, and its own policy rules — with no interoperability, no formal validation, and no shared vocabulary.
What This Artifact Is
Section titled “What This Artifact Is”The Interactive Oral Assessment Ontology and Reference Model is a design science artifact that formalizes the core concepts, relationships, system responsibilities, evidence semantics, runtime policies, and governance boundaries of interactive oral assessment systems. Its machine-processable manifestation is the Interactive Oral Assessment Executable Specification, represented by the
ExamRuntimePackage. In the engineering pipeline, this package functions as an intermediate representation between authoring tools, runtime controllers, execution adapters, and marking systems.
The IOA-ORM has four complementary roles:
-
Domain ontology — it defines the core vocabulary and semantics of interactive oral assessment, including evidence targets, evidence signals, candidate commands, assessment profiles, completion policies, moderation policies, runtime events, and agent boundaries.
-
Reference model — it defines the reusable system abstraction for IOA platforms, including authoring tools, executable specification packages, runtime controllers, voice runtimes, event stores, evidence ledgers, marking runtimes, and moderation workflows.
-
Executable specification — it provides a machine-processable, versioned package that encodes exam structure, policies, evidence requirements, runtime semantics, validation constraints, and audit requirements.
-
Intermediate representation — within the engineering pipeline, the executable specification acts as an intermediate representation between authoring tools, runtime engines, policy enforcement layers, and marking systems.
Note: We use “ontology-grounded” rather than simply “ontology” because the artifact defines a shared vocabulary and formal semantics grounded in assessment theory, but does not currently provide OWL/RDF axioms or description-logic reasoning. The term acknowledges the ontological contribution without over-claiming a full semantic-web implementation.
The canonical package produced by this artifact is the ExamRuntimePackage — a published, versioned, machine-readable specification of an oral assessment. The artifact is not tied to any specific platform. Kōrero is one consumer; any system that conducts interactive oral assessments could adopt this as its canonical exam specification.
Layered Artifact Model
Section titled “Layered Artifact Model”This artifact is organized into four layers:
┌─────────────────────────────────────────────────────────────┐
│ Domain Ontology — shared vocabulary and semantics │
│ (AssessmentProfile, EvidenceTarget, CandidateCommand, …) │
├─────────────────────────────────────────────────────────────┤
│ Reference Model — reusable system abstraction │
│ (Authoring → IR → Runtime → Evidence → Marking → Audit) │
├─────────────────────────────────────────────────────────────┤
│ Executable Specification — machine-readable package │
│ (ExamRuntimePackage, schema, validation rules) │
├─────────────────────────────────────────────────────────────┤
│ Intermediate Representation — engineering pipeline role │
│ (Authoring Model → ExamRuntimePackage → Runtime Config) │
└─────────────────────────────────────────────────────────────┘
Design Science Contribution
Section titled “Design Science Contribution”Following Design Science Research (March & Smith, 1995; Gregor & Hevner, 2013), this artifact contributes at multiple levels:
| Artifact Component | DSR Artifact Type | IOA-ORM Layer |
|---|---|---|
EvidenceTarget, EvidenceSignal, CandidateCommand, AssessmentProfile, RuntimeEvent | Constructs | IOA Domain Ontology |
ExamRuntimePackage, object model, architecture, component relationships | Model | IOA Reference Model |
| Validation rules, transition rules, policy evaluation, recovery procedures, compilation mappings | Method | Specification and Validation Method |
| Kōrero implementation, runtime adapter, controller, evidence ledger integration | Instantiation | Platform Instantiation |
Why This Specification Is Necessary
Section titled “Why This Specification Is Necessary”An oral assessment is not a chatbot conversation. It has structural requirements that generic dialogue systems cannot express:
Assessment structure must be enforceable. An exam has a defined sequence of sections, each with time budgets, completion criteria, and transition rules. These are hard constraints — not suggestions to the AI. A runtime controller must enforce them deterministically, regardless of what the generative model produces.
Evidence must be captured during the exam, not derived after. When a candidate demonstrates competence (or fails to), the system must record structured evidence in real time — not rely on post-hoc transcript analysis. A transcript shows what was said; an evidence ledger records what was demonstrated.
The AI examiner must be bounded. An AI examiner needs creative freedom to generate natural follow-ups, handle unexpected responses, and adapt to candidate behavior. But it must not skip exam sections, reveal rubric criteria, score candidates directly, or ignore candidate commands (e.g., “can you repeat that?”). Autonomy must exist within explicit boundaries.
Assessment properties must be inspectable. An exam that claims to assess “interpersonal competence through structured dialogue” (Joughin’s interaction dimension) should have that claim encoded in its specification — not buried in code. The runtime should be able to verify that the exam actually operates as designed.
Fairness and moderation must be built in. At scale, AI-conducted exams need human moderation workflows, calibration profiles, and fairness auditing across demographic dimensions. These cannot be afterthoughts — they must be first-class properties of the exam specification.
Addressed Gaps
Section titled “Addressed Gaps”This specification addresses the following gaps in current practice:
| Gap | What This Specification Provides |
|---|---|
| No formal specification for AI-conducted oral assessments | A versioned, machine-readable exam specification with 26 schema sections covering structure, policies, evidence, and runtime behavior |
| Assessment theory disconnected from runtime execution | AssessmentProfile encoding Joughin’s (1998) six dimensions as first-class runtime parameters |
| No structured evidence capture during live exams | EvidenceLedger with real-time EvidenceSignal emission, provenance tracking, and confidence scoring |
| AI examiner behavior not formally bounded | Three-layer agent boundary model with allowed/forbidden action catalog and runtime enforcement |
| No compile-time validation of exam designs | 117 validation rules across 10 categories, checking structural, semantic, and assessment-theoretic consistency |
| Candidate commands not consumed by runtime | CandidateCommand as runtime primitives (repeat, clarification, pause, raise-hand) with processing rules |
| No event contract for downstream consumers | Typed event protocol with 20+ event types, delivery guarantees, and audit trail |
| Exams not versioned or diffable | Dual versioning scheme (schema version + assessment-theoretic version) with published package immutability |
| Recovery from anomalies not standardized | RecoveryPolicy with categorized strategies for silence, unclear answers, off-topic responses, anxiety, and technical failures |
| Moderation and fairness not built into exam spec | ModerationPolicy, CalibrationProfile, and FairnessAudit as first-class constructs |
Where AI Examiner Autonomy Fits
Section titled “Where AI Examiner Autonomy Fits”The AI examiner MUST be autonomous within a bounded creative space:
- Follow-up generation. Given a candidate response, the examiner SHOULD generate natural, contextually appropriate follow-ups — but MUST respect
maxFollowUpsandforbiddenFollowUpPatternsfrom the runtime policy. - Evidence judgment. The examiner MAY assess whether a candidate response satisfies an
EvidenceTarget, producing anEvidenceSignal— but MUST NOT override explicit rubric thresholds or fabricate signals. - Repair and recovery. The examiner SHOULD handle silence, unclear answers, off-topic responses, and candidate anxiety with natural language repair — but MUST follow the prescribed
RecoveryPolicysequence, not invent ad-hoc interventions. - Bridging. The examiner MAY generate natural transitions between nodes — but MUST NOT skip nodes, reorder the exam structure, or jump to topics not defined in the graph.
Autonomy lives inside nodes, bounded by policies. The runtime controller enforces boundaries at node entry, during turns, and at transitions.
What the Runtime Controller Must Enforce
Section titled “What the Runtime Controller Must Enforce”The runtime controller is the policy enforcement layer between the AI examiner’s generative freedom and the exam’s structural integrity. It MUST:
- Gate node transitions. No transition occurs without evaluating the
CompletionPolicyof the current node and theTransitionPolicyof the target edge. - Count and cap follow-ups. Every follow-up increments a counter. When
maxFollowUpsis reached, the controller forces transition — not the LLM. - Enforce time budgets. Per-node and global time limits are hard constraints. The controller MUST force-transition or terminate when budgets expire.
- Consume candidate commands. Repeat, clarification, raise-hand, pause — these are runtime primitives, not UI decorations. The controller MUST process them and inject appropriate responses.
- Persist the evidence ledger. Every
EvidenceSignalproduced during the exam MUST be written to the ledger before the exam can complete. The ledger is not a transcript byproduct — it is a first-class output. - Emit structured events. Every state change (node entered, turn completed, evidence collected, command processed, policy violation) MUST produce a
RuntimeEventfor the event store. - Enforce the agent boundary. The controller MUST reject any examiner action that violates
AllowedAction/ForbiddenActionpolicies, logging the violation as an event.
Why Transcript Alone Is Not Enough
Section titled “Why Transcript Alone Is Not Enough”A transcript records what was said. An oral assessment requires recording what was demonstrated. The gap:
- A transcript shows “Candidate discussed photosynthesis for 3 minutes.” The evidence ledger records:
EvidenceSignal { targetId: "photosynthesis-mechanism", confidence: 0.85, source: "ai-judgment", turnRange: [12, 15] }. - A transcript cannot distinguish between a candidate who gave one brilliant answer and one who needed five follow-ups to reach the same conclusion. The runtime state (follow-up count, recovery attempts) carries assessment-critical information.
- A transcript is flat. The exam structure (which node, which rubric criterion, which time budget) is lost without the runtime context.
Transcript is necessary but insufficient. The evidence ledger, runtime state, and event log together form the complete marking input.
Why the Evidence Ledger Should Be First-Class
Section titled “Why the Evidence Ledger Should Be First-Class”The evidence ledger is not a post-processing step over the transcript. It is a structured, real-time, authoritative record of assessment evidence:
- Signals are emitted during the exam, not derived after. The AI examiner produces
EvidenceSignalobjects as it judges candidate responses. These are written to the ledger immediately. - Signals carry provenance. Each signal records whether it came from AI judgment, explicit rubric match, candidate self-report, or external trigger.
- Signals are linked to structure. Each signal references an
EvidenceTargetdefined in the exam specification, connecting evidence to rubric criteria. - The marking runtime reads the ledger, not the transcript. The marking pipeline consumes structured signals with confidence scores and turn references — not raw STT output.
When the evidence ledger is first-class, the marking pipeline becomes deterministic, auditable, and separable from the conversational runtime.
2. Theoretical Grounding
Section titled “2. Theoretical Grounding”This specification is grounded in the oral assessment literature. Its design decisions are informed by four key works:
| Paper | Key Insight | Design Impact |
|---|---|---|
| Joughin (1998) | Six dimensions of oral assessment: content type, interaction, authenticity, structure, examiners, orality | AssessmentProfile on ExamRuntimePackage |
| Akimov & Malin (2020) | Validity/reliability/fairness matrix. Recording + moderation for reliability. Question banking for inter-case reliability. | ModerationPolicy, QuestionPool, CalibrationProfile |
| Bayley et al. (2024) | ConVOE model for 600+ students: parallel administration, batch grading, practice sessions. | expectedCandidateCount, QuestionPool.allowReuseAcrossConcurrentSessions |
| Fenton (2025) | IOA components. Prompting taxonomy. Formative vs. summative. Examiner training. Communication skills. | PromptingLevel, assessmentPurpose, scaffoldingBudget, identity_check node |
| Bloom (1956) | Six cognitive levels: Remember → Understand → Apply → Analyze → Evaluate → Create. AI struggles at higher levels. | BloomLevel on EvidenceTarget; cognitiveEscalationStrategy on FollowUpPolicy |
The inclusion of Bloom’s Taxonomy as a design parameter addresses a key argument for AI-era oral assessment: generative AI tools perform well at the lower levels of Bloom’s taxonomy (Remember, Understand) but struggle at the Create level and at making arguments built on theoretical frameworks (Fenton, 2025). By encoding cognitive levels on evidence targets, the specification enables validation that an exam tests the intended range of cognitive demands — and enables the AI examiner to escalate follow-up probing toward higher-order thinking.
The specification’s most important theoretical move is encoding Joughin’s six dimensions as the AssessmentProfile — a first-class property of the exam package. These dimensions are not metadata decorations; they are design parameters that constrain runtime behavior, inform evidence interpretation, and support validity arguments. An exam that declares interactionMode: "structured_dialogue" and structureProfile.opennessScore: 0.2 is making a claim about its reliability profile that the runtime can verify.
The evidence model separates collection from scoring (the AI examiner proposes signals; the marking runtime assigns marks). This addresses Akimov & Malin’s (2020) concern about intra-rater reliability: the AI’s judgments are recorded but not final — human moderation can override them.
3. What This Artifact Is (Technical Summary)
Section titled “3. What This Artifact Is (Technical Summary)”The IOA-ORM is the canonical, versioned, executable specification of a published oral assessment. It is:
- The single source of truth for an exam’s structure, policies, evidence targets, and runtime behavior.
- A compilation target from the authoring studio’s high-level exam model.
- A compilation source for the runtime controller configuration, the execution adapter, and the marking runtime configuration.
- A versioned artifact with a stable identity, changelog, and diffability between versions.
4. What This Artifact Is NOT
Section titled “4. What This Artifact Is NOT”| This artifact is NOT… | Because… |
|---|---|
| A UI schema | It does not describe frontend layout, styling, or component tree. The frontend consumes runtime events and state — it does not render the specification. |
| An execution engine config | The specification compiles to execution-specific configurations. It carries richer semantics (policies, evidence, constraints) that execution engines typically cannot express. |
| A prompt template | The AI examiner’s system prompt is derived from this specification at runtime. The specification defines what the examiner must do; the prompt describes how to speak. |
| A marking rubric | Rubric criteria inform EvidenceTarget definitions, but the specification is the runtime executable spec, not the scoring model. |
| A chatbot workflow | Generic dialogue graphs lack assessment-specific concepts: evidence targets, candidate commands, completion policies, time budgets, recovery strategies. |
5. Design Goals
Section titled “5. Design Goals”| # | Goal | Description |
|---|---|---|
| G1 | Authoring-friendly | IR is a natural compilation target from the authoring studio’s exam flow model. No manual authoring SHOULD be required. |
| G2 | Runtime-controllable | Hard constraints on node progression, follow-ups, transitions, time budgets, candidate commands, and evidence capture. Policies are machine-enforceable. |
| G3 | Agentic but bounded | The AI examiner has creative freedom inside nodes — but policies, guardrails, and the runtime controller enforce structural boundaries. |
| G4 | Observable and auditable | Every significant state change produces a structured RuntimeEvent. The event log is the audit trail. |
| G5 | Marking-ready | The evidence ledger provides structured, linked, confidence-scored signals to the marking runtime — not raw transcript. |
| G6 | Execution-agnostic | The IR compiles to execution-specific configurations. It is not tied to any particular runtime engine or voice pipeline. |
| G7 | Versioned and diffable | Each published exam has a stable specification version. Changes between versions are inspectable. |
| G8 | Assessment-theoretically grounded | The specification encodes Joughin’s (1998) six dimensions as design parameters. Design decisions are traceable to the assessment theory knowledge base. |
| G9 | Validity-aware | The specification supports structured validity claims (face, content, construct, concurrent), moderation workflows, and calibration profiles. |
| G10 | Fairness-auditable | The specification supports fairness auditing across demographic dimensions. The evidence model captures enough data for post-hoc disparity analysis. |
6. Non-Goals
Section titled “6. Non-Goals”| # | Non-Goal | Rationale |
|---|---|---|
| NG1 | Replace the execution engine | The runtime handles real-time voice pipeline (STT, LLM, TTS). The specification is the domain spec; the runtime is the execution engine. |
| NG2 | Define UI components | The frontend is a consumer of runtime events, not a renderer of the specification. |
| NG3 | Define scoring algorithms | The specification provides evidence; scoring logic lives in the marking runtime. |
| NG4 | Support non-oral assessments | This specification is designed for interactive oral exams. Written, MCQ, or portfolio assessments have different runtime semantics. |
| NG5 | Replace the authoring studio | The authoring studio is the human-facing tool. The specification is the machine-facing spec it produces. |
| NG6 | Define session management at scale | The specification defines exam structure; session orchestration, batch processing, and cohort management are platform/runtime concerns. |
| NG7 | Define signal processing pipelines | Paralinguistic analysis (prosody, speaking rate, pitch) is a runtime/STT concern, not a specification concern. The specification captures assessment-level semantics, not acoustic features. |
7. High-Level Architecture
Section titled “7. High-Level Architecture”┌─────────────────────────────────────────────────────────────────┐
│ AUTHORING STUDIO │
│ Lecturers design exam flows, define rubrics, set policies │
│ │
│ Exam Flow Model ──compile──► ExamRuntimePackage (IR) │
└──────────────────────────────┬──────────────────────────────────┘
│
┌──────────▼──────────┐
│ IOA-ORM │
│ (canonical spec) │
│ versioned, stable │
└──┬──────┬──────┬───┘
│ │ │
┌────────────▼┐ ┌─▼──────▼──────────────┐
│ Execution │ │ Runtime Controller │
│ Adapter │ │ (policy enforcement) │
│ │ │ │
│ Compiles IR │ │ Enforces: transitions, │
│ to engine │ │ follow-up caps, time, │
│ config + │ │ commands, evidence │
│ node config │ │ writes, agent boundary │
└──────┬───────┘ └────────┬───────────────┘
│ │
┌────────────▼───────────────────▼────────────┐
│ REAL-TIME VOICE RUNTIME │
│ STT · LLM · TTS · Voice Pipeline │
│ (e.g., Pipecat + LiveKit, or equivalent) │
└────────────┬───────────────────┬────────────┘
│ │
┌────────────▼────────┐ ┌──────▼──────────────┐
│ Event Store │ │ Evidence Ledger │
│ (RuntimeEvents) │ │ (EvidenceSignals) │
│ │ │ │
│ Audit trail, │ │ Structured evidence │
│ analytics, replay │ │ for marking runtime │
└─────────────────────┘ └──────────────────────┘
│
┌───────▼───────────┐
│ Marking Runtime │
│ (reads ledger) │
│ produces scores │
└───────────────────┘
Relationship Summary
Section titled “Relationship Summary”| Component | Reads From | Writes To | Role |
|---|---|---|---|
| Authoring Studio | Lecturer input | ExamRuntimePackage | Human-facing design tool |
| IOA-ORM | — | — | Canonical versioned spec |
| Runtime Controller | ExamRuntimePackage, RuntimeState | RuntimeState, EventStore, EvidenceLedger | Policy enforcement engine |
| Execution Adapter | ExamRuntimePackage | Engine-specific config, node config | IR → execution config compiler |
| Voice Runtime | Engine config, node config | Transcript, audio | Real-time voice pipeline |
| Event Store | RuntimeEvent stream | Persisted event log | Audit, analytics, replay |
| Evidence Ledger | EvidenceSignal stream | Persisted evidence records | Marking input |
| Marking Runtime | EvidenceLedger, Transcript | Scores, reports | Assessment outcomes |
| Frontend Exam Room | EventStore, RuntimeState (via data channel) | CandidateCommands | Candidate interface |
8. Document Map
Section titled “8. Document Map”| Document | Content |
|---|---|
00-overview.md | This file. Purpose, gaps addressed, theoretical grounding, architecture. |
01-concepts.md | Theoretical foundations, glossary, domain entities, conceptual object model. |
02-schema.md | TypeScript interfaces for all core objects (26 sections). |
03-runtime-semantics.md | State machine, transition rules, policy evaluation. |
04-agent-boundary.md | Allowed/forbidden actions, guardrail enforcement. |
05-event-protocol.md | Event types, payloads, delivery guarantees. |
06-evidence-ledger.md | Signal lifecycle, ledger schema, marking integration. |
07-pipecat-adapter.md | Compilation rules, FlowManager mapping, limitations. |
08-validation-rules.md | Compile-time validation of specification packages. |
09-versioning.md | Version scheme, migration, compatibility. |
10-examples.md | Complete worked examples. |
11-migration-plan.md | Incremental migration from existing runtime configs. |
12-testing-strategy.md | Unit, integration, simulation testing. |
13-open-questions.md | Unresolved design decisions. |
Revision History
Section titled “Revision History”| Version | Date | Changes |
|---|---|---|
| v0.2.0 | 2026-06-30 | Reframed as IOA-ORM. Added 4-layer artifact model, DSR contribution table, formal definition. Replaced ‘Exam Runtime IR’ with ‘IOA-ORM’. Added IOA-centric framing (implementation-agnostic). Added Bloom’s Taxonomy rationale. |
| v0.1.0 | 2026-05-06 | Initial release. |