Course Content
Introduction to the Ontology
Why ontologies exist, what problems they solve, and where they fit between raw data and applications
The problem ontologies solve
Most organizations have the same painful story:
- Marketing has a definition of “active customer.”
- Finance has a different one.
- The data warehouse has a third.
- The production database has a fourth, encoded implicitly in business logic across 12 microservices.
When the CEO asks “how many active customers do we have?” four people answer with four numbers, and nobody is wrong — they are each correct against a different definition.
The same fragmentation shows up everywhere: what is a shipment “in transit”? What counts as a “completed” order? Who is the “owner” of an account? Every team rebuilds the answers, in code, in SQL, in spreadsheets — slightly differently each time.
The ontology is the one place where these answers live.
What is an ontology?
An ontology is a typed, governed, semantic model of the real-world entities your business operates on — and the relationships and actions that connect them.
Concretely, an ontology is made of three primitives:
| Primitive | What it is | Example |
|---|---|---|
| Object type | A noun in the business — an entity | Customer, Order, Shipment, Driver |
| Link type | A verb between objects — a relationship | Customer → places → Order, Driver → operates → Vehicle |
| Action type | A typed mutation to ontology state | markShipmentDelivered, assignDriverToRoute |
Sometimes you also work with functions (compute over the ontology) and interfaces (contracts that multiple object types can satisfy), but those build on top of the three above.
Where the ontology sits
A useful way to picture it:
┌──────────────────────────────────────────┐
│ Applications, dashboards, AI agents │
├──────────────────────────────────────────┤
│ ONTOLOGY │ ← typed business model
│ Object Types · Link Types · Actions │
├──────────────────────────────────────────┤
│ Datasets · Streams · APIs · Files │ ← raw data
└──────────────────────────────────────────┘Below the ontology: raw data — Parquet files, Postgres tables, Kafka topics, REST APIs from SaaS tools.
Above the ontology: every consumer — operational apps, BI dashboards, ML models, AI agents — speaking a single, shared vocabulary.
Ontology vs. data warehouse
A common question: “isn’t this just a data warehouse?” No — they overlap, but the focus is different.
| Concern | Data warehouse | Ontology |
|---|---|---|
| Primary purpose | Analytical queries | Operational model + queries |
| Schema style | Star / snowflake, denormalized | Object-centric, typed graph |
| Write semantics | Append-only ETL | Typed actions with validation |
| Consumed by | BI tools, analysts | BI and apps, agents, services |
| Identity | Surrogate keys | Stable, business-meaningful IDs |
| Governance | Column docs | Object-, property-, row-level policies |
A warehouse asks “what happened?” An ontology asks “what is true right now, and how do I change it?”
A worked example
Imagine a logistics company. Their raw data:
shipments.csv— 200M rows, updated nightly from the operational DBvehicle_telemetry— a Kafka stream, 50k events/seconddriverstable in HR’s Workday account, synced via RESTcustomer_contracts— PDFs, parsed by an OCR pipeline
Without an ontology, every team that needs “the shipment with its current driver and the customer it belongs to” writes a join across all four sources — and each team writes it slightly differently.
With an ontology:
Shipmentis one object type, backed by the operational DB and enriched by the stream.DriverandCustomerare their own object types.- Link types wire them together:
Shipment → assignedTo → Driver,Shipment → orderedBy → Customer. - An action
markDelivered(shipmentId, deliveredAt, signature)is the only way state can transition — validated, logged, permissioned.
Every app, dashboard, and AI agent now reads and writes through the same typed surface.
When to use an ontology — and when not to
Use an ontology when:
- Multiple teams build on the same domain concepts.
- You need operational reads and writes, not just analytics.
- Definitions disagree across teams and the disagreement costs you.
- You want AI agents or no-code apps to safely operate on real data.
Skip it when:
- You have one app, one team, one database. Just use the database.
- The domain is throwaway — a one-off analysis, a research notebook.
- You do not yet have the data integrated. Get the data flowing first.
Key terms to remember
- Object type — a definition of an entity (a class). The instances are objects.
- Property — a typed field on an object type (
firstName: string,weightKg: double). - Link type — a typed relationship between two object types.
- Action type — a typed, validated mutation to ontology state.
- Function — a typed compute over ontology data.
- Datasource — the underlying dataset / stream / API backing an object type.
What’s next
Now that you know what an ontology is and why it exists, the next lesson covers the broader pattern it implements: the semantic layer.
Then we will look at the architecture that makes the three primitives work together.
Happy modeling! 🧭