Course Content
Ontology Architecture
The three pillars: objects, links, and actions — and how they compose into a working ontology
The big picture
An ontology is not a single component — it is a stack of cooperating layers. From the bottom up:
┌─────────────────────────────────────────────────┐
│ 6. Consumers (apps, dashboards, agents) │
├─────────────────────────────────────────────────┤
│ 5. Query, Action, Function API surface │
├─────────────────────────────────────────────────┤
│ 4. Security & policy layer │
├─────────────────────────────────────────────────┤
│ 3. Ontology definitions │
│ object types · link types · action types │
│ functions · interfaces │
├─────────────────────────────────────────────────┤
│ 2. Index & serving layer │
│ in-memory / search / graph indexes │
├─────────────────────────────────────────────────┤
│ 1. Datasource layer │
│ datasets · streams · external APIs │
└─────────────────────────────────────────────────┘Each layer has one job. Treat them separately and the system stays comprehensible at scale.
Layer 1 — Datasources
The bottom layer is the raw data your ontology is backed by:
- Batch datasets — tables, Parquet files, daily exports.
- Streams — Kafka, Kinesis, CDC streams from operational DBs.
- External APIs — Salesforce, Workday, custom REST endpoints.
- Files — PDFs, images, attachments referenced by an object.
Datasources are owned and versioned outside the ontology. The ontology binds to them; it does not replace them.
Layer 2 — Indexes and serving
Raw datasources are not optimized for the access patterns the ontology needs (lookup by primary key, traverse a link, filter on properties, search across objects). The serving layer materializes datasources into indexes:
- Primary-key index —
Customer:cust_1234→ the full object. - Property indexes — filter by
country = "DE"without a full scan. - Search index — full-text over titles and free-text properties.
- Link index — given a
Customer, return all linkedOrderIDs in O(1).
These indexes are an implementation detail — they exist so that operations against the ontology stay fast, even as datasets grow into the billions of rows.
Layer 3 — Ontology definitions
This is the schema of the ontology — the part you and your team author:
- Object types define the entities (
Customer,Order,Shipment). - Property types define the typed fields on those entities.
- Link types define how object types relate.
- Action types define typed mutations.
- Functions define typed compute.
- Interfaces define cross-cutting contracts (e.g.,
Locatable— anything with a lat/lng).
Definitions are versioned. They live in source control, not just in a UI. Treat them with the same rigor as application code.
Layer 4 — Security and policy
Security is not glued on after the fact — it sits between definitions and the API surface, so every read and write goes through it.
Three levels you should expect:
- Object-level — who can see this object type at all?
- Property-level — within an object, are some properties (
Customer.ssn) restricted? - Row-level — within a property, do only some rows apply (
Customerrows whereregion = userRegion)?
We dedicate a full lesson to this later. For now: assume every API call carries a user context, and every layer-3 definition can attach policies.
Layer 5 — API surface
Above security, the ontology exposes three families of operations:
| Family | What it does | Example |
|---|---|---|
| Query | Read object sets, traverse links, aggregate | ”Top 10 customers by revenue in EU” |
| Action | Mutate state | ”Mark shipment delivered” |
| Function | Compute typed values | ”Estimated delivery time” |
The surface is typically exposed as GraphQL, gRPC, or typed SDKs in TypeScript / Python / Java. The exact protocol matters less than the shape — every operation is typed against the ontology.
Layer 6 — Consumers
Anything that calls the API:
- Operational apps — internal tools that drive day-to-day work.
- Dashboards — analytical views.
- AI agents — LLM-powered workflows that read and (carefully) write.
- Services — backend systems that need to read or write business state.
Each consumer talks to the same API surface against the same model. They never read the underlying datasources directly. This is the rule that makes the whole pattern work.
Read path — tracing a query
Let’s trace what happens when a dashboard asks “show me the 50 most recent shipments to Germany, with their assigned driver.”
1. Dashboard issues a query against the API surface (GraphQL/SDK).
2. Security layer checks: can this user see Shipment and Driver?
And can they see Shipment.destination and Driver.name?
3. Definitions layer resolves "Shipment" and "Driver" to indexes.
4. Index layer:
- filters Shipment by destinationCountry = "DE"
- sorts by createdAt desc, limit 50
- traverses the Shipment → assignedTo → Driver link
- returns hydrated objects
5. Security layer redacts any properties the user cannot see.
6. API returns typed objects to the dashboard.Notice the application never touches the datasource. Even if the underlying data moves from Postgres to BigQuery to Iceberg, the dashboard code does not change.
Write path — tracing an action
Now: a dispatcher clicks “Mark Delivered” on a shipment.
1. The app calls action `markShipmentDelivered(shipmentId, deliveredAt, signature)`.
2. Security layer checks: can this user invoke this action type?
3. Action handler validates parameters:
- shipmentId resolves to a real, non-deleted Shipment
- deliveredAt is after createdAt
- signature is non-empty
- current status is one of {in_transit, out_for_delivery}
4. Action applies the state change:
- updates Shipment.status = "delivered"
- updates Shipment.deliveredAt = ...
- emits a domain event "ShipmentDelivered"
5. The action is written to an audit log: who, when, what changed.
6. Indexes update — usually asynchronously, sometimes synchronously
depending on the consistency guarantees of the platform.The state change is atomic at the action boundary. Two consumers cannot half-update the same shipment from two different apps because there is only one path — the action.
Where ontology-level caching fits
Hot reads benefit massively from caching. Sensible places:
- Object lookup by primary key — almost always cacheable, invalidated on action commit.
- Function results — pure functions can be memoized.
- Search index results — short TTLs (seconds) usually suffice.
Caching does not belong in the consumer. It belongs at the ontology layer so every consumer benefits equally and invalidation happens centrally.
A concrete shape — typical request lifecycle
| Step | Owner | Failure mode |
|---|---|---|
| Parse and authenticate | API surface | 401 unauthorized |
| Authorize (object/prop/row) | Security layer | 403 forbidden |
| Validate (action params, query shape) | Definitions layer | 422 validation error |
| Execute | Index layer | 500 / 503 with retry-after |
| Audit + invalidate | Platform | (logged, non-fatal) |
Treating these stages explicitly makes debugging an order of magnitude easier than the alternative — one giant pile of business logic with no clear seams.
Key takeaways
- The ontology is a layered system: datasources → indexes → definitions → security → API → consumers.
- Reads and writes both flow through the same security layer, against the same definitions, regardless of which app is calling.
- Definitions live in source control. Indexes are implementation detail. Datasources stay where they are.
- Mutations happen through action types — there is no other write path.
What’s next
Now that you have the map, we zoom into each primitive. Next up: object types — the nouns of your domain.
Mapping the territory before we walk it. 🗺️