Introduction to the Ontology

The problem ontologies solve

Most organizations have the same painful story:

Marketing has a definition of “active customer.”
Finance has a different one.
The data warehouse has a third.
The production database has a fourth, encoded implicitly in business logic across 12 microservices.

When the CEO asks “how many active customers do we have?” four people answer with four numbers, and nobody is wrong — they are each correct against a different definition.

The same fragmentation shows up everywhere: what is a shipment “in transit”? What counts as a “completed” order? Who is the “owner” of an account? Every team rebuilds the answers, in code, in SQL, in spreadsheets — slightly differently each time.

The ontology is the one place where these answers live.

What is an ontology?

An ontology is a typed, governed, semantic model of the real-world entities your business operates on — and the relationships and actions that connect them.

Concretely, an ontology is made of three primitives:

Primitive	What it is	Example
Object type	A noun in the business — an entity	`Customer`, `Order`, `Shipment`, `Driver`
Link type	A verb between objects — a relationship	`Customer → places → Order`, `Driver → operates → Vehicle`
Action type	A typed mutation to ontology state	`markShipmentDelivered`, `assignDriverToRoute`

Sometimes you also work with functions (compute over the ontology) and interfaces (contracts that multiple object types can satisfy), but those build on top of the three above.

Where the ontology sits

A useful way to picture it:

┌──────────────────────────────────────────┐
│   Applications, dashboards, AI agents    │
├──────────────────────────────────────────┤
│              ONTOLOGY                    │  ← typed business model
│   Object Types · Link Types · Actions    │
├──────────────────────────────────────────┤
│  Datasets · Streams · APIs · Files       │  ← raw data
└──────────────────────────────────────────┘

Below the ontology: raw data — Parquet files, Postgres tables, Kafka topics, REST APIs from SaaS tools.

Above the ontology: every consumer — operational apps, BI dashboards, ML models, AI agents — speaking a single, shared vocabulary.

Ontology vs. data warehouse

A common question: “isn’t this just a data warehouse?” No — they overlap, but the focus is different.

Concern	Data warehouse	Ontology
Primary purpose	Analytical queries	Operational model + queries
Schema style	Star / snowflake, denormalized	Object-centric, typed graph
Write semantics	Append-only ETL	Typed actions with validation
Consumed by	BI tools, analysts	BI and apps, agents, services
Identity	Surrogate keys	Stable, business-meaningful IDs
Governance	Column docs	Object-, property-, row-level policies

A warehouse asks “what happened?” An ontology asks “what is true right now, and how do I change it?”

A worked example

Imagine a logistics company. Their raw data:

shipments.csv — 200M rows, updated nightly from the operational DB
vehicle_telemetry — a Kafka stream, 50k events/second
drivers table in HR’s Workday account, synced via REST
customer_contracts — PDFs, parsed by an OCR pipeline

Without an ontology, every team that needs “the shipment with its current driver and the customer it belongs to” writes a join across all four sources — and each team writes it slightly differently.

With an ontology:

Shipment is one object type, backed by the operational DB and enriched by the stream.
Driver and Customer are their own object types.
Link types wire them together: Shipment → assignedTo → Driver, Shipment → orderedBy → Customer.
An action markDelivered(shipmentId, deliveredAt, signature) is the only way state can transition — validated, logged, permissioned.

Every app, dashboard, and AI agent now reads and writes through the same typed surface.

When to use an ontology — and when not to

Use an ontology when:

Multiple teams build on the same domain concepts.
You need operational reads and writes, not just analytics.
Definitions disagree across teams and the disagreement costs you.
You want AI agents or no-code apps to safely operate on real data.

Skip it when:

You have one app, one team, one database. Just use the database.
The domain is throwaway — a one-off analysis, a research notebook.
You do not yet have the data integrated. Get the data flowing first.

Key terms to remember

Object type — a definition of an entity (a class). The instances are objects.
Property — a typed field on an object type (firstName: string, weightKg: double).
Link type — a typed relationship between two object types.
Action type — a typed, validated mutation to ontology state.
Function — a typed compute over ontology data.
Datasource — the underlying dataset / stream / API backing an object type.

What’s next

Now that you know what an ontology is and why it exists, the next lesson covers the broader pattern it implements: the semantic layer.

Then we will look at the architecture that makes the three primitives work together.

Happy modeling! 🧭

Course Content

The problem ontologies solve

What is an ontology?

Where the ontology sits

Ontology vs. data warehouse

A worked example

When to use an ontology — and when not to

Key terms to remember

What’s next

🍪 Cookie Notice

Cookie Preferences

Essential Cookies

Analytics Cookies

Marketing Cookies

Functionality Cookies