Press ESC to exit fullscreen
📖 Lesson ⏱️ 45 minutes

Introduction to the Ontology

Why ontologies exist, what problems they solve, and where they fit between raw data and applications

The problem ontologies solve

Most organizations have the same painful story:

  • Marketing has a definition of “active customer.”
  • Finance has a different one.
  • The data warehouse has a third.
  • The production database has a fourth, encoded implicitly in business logic across 12 microservices.

When the CEO asks “how many active customers do we have?” four people answer with four numbers, and nobody is wrong — they are each correct against a different definition.

The same fragmentation shows up everywhere: what is a shipment “in transit”? What counts as a “completed” order? Who is the “owner” of an account? Every team rebuilds the answers, in code, in SQL, in spreadsheets — slightly differently each time.

The ontology is the one place where these answers live.

What is an ontology?

An ontology is a typed, governed, semantic model of the real-world entities your business operates on — and the relationships and actions that connect them.

Concretely, an ontology is made of three primitives:

PrimitiveWhat it isExample
Object typeA noun in the business — an entityCustomer, Order, Shipment, Driver
Link typeA verb between objects — a relationshipCustomer → places → Order, Driver → operates → Vehicle
Action typeA typed mutation to ontology statemarkShipmentDelivered, assignDriverToRoute

Sometimes you also work with functions (compute over the ontology) and interfaces (contracts that multiple object types can satisfy), but those build on top of the three above.

Where the ontology sits

A useful way to picture it:

┌──────────────────────────────────────────┐
│   Applications, dashboards, AI agents    │
├──────────────────────────────────────────┤
│              ONTOLOGY                    │  ← typed business model
│   Object Types · Link Types · Actions    │
├──────────────────────────────────────────┤
│  Datasets · Streams · APIs · Files       │  ← raw data
└──────────────────────────────────────────┘

Below the ontology: raw data — Parquet files, Postgres tables, Kafka topics, REST APIs from SaaS tools.

Above the ontology: every consumer — operational apps, BI dashboards, ML models, AI agents — speaking a single, shared vocabulary.

Ontology vs. data warehouse

A common question: “isn’t this just a data warehouse?” No — they overlap, but the focus is different.

ConcernData warehouseOntology
Primary purposeAnalytical queriesOperational model + queries
Schema styleStar / snowflake, denormalizedObject-centric, typed graph
Write semanticsAppend-only ETLTyped actions with validation
Consumed byBI tools, analystsBI and apps, agents, services
IdentitySurrogate keysStable, business-meaningful IDs
GovernanceColumn docsObject-, property-, row-level policies

A warehouse asks “what happened?” An ontology asks “what is true right now, and how do I change it?”

A worked example

Imagine a logistics company. Their raw data:

  • shipments.csv — 200M rows, updated nightly from the operational DB
  • vehicle_telemetry — a Kafka stream, 50k events/second
  • drivers table in HR’s Workday account, synced via REST
  • customer_contracts — PDFs, parsed by an OCR pipeline

Without an ontology, every team that needs “the shipment with its current driver and the customer it belongs to” writes a join across all four sources — and each team writes it slightly differently.

With an ontology:

  • Shipment is one object type, backed by the operational DB and enriched by the stream.
  • Driver and Customer are their own object types.
  • Link types wire them together: Shipment → assignedTo → Driver, Shipment → orderedBy → Customer.
  • An action markDelivered(shipmentId, deliveredAt, signature) is the only way state can transition — validated, logged, permissioned.

Every app, dashboard, and AI agent now reads and writes through the same typed surface.

When to use an ontology — and when not to

Use an ontology when:

  • Multiple teams build on the same domain concepts.
  • You need operational reads and writes, not just analytics.
  • Definitions disagree across teams and the disagreement costs you.
  • You want AI agents or no-code apps to safely operate on real data.

Skip it when:

  • You have one app, one team, one database. Just use the database.
  • The domain is throwaway — a one-off analysis, a research notebook.
  • You do not yet have the data integrated. Get the data flowing first.

Key terms to remember

  • Object type — a definition of an entity (a class). The instances are objects.
  • Property — a typed field on an object type (firstName: string, weightKg: double).
  • Link type — a typed relationship between two object types.
  • Action type — a typed, validated mutation to ontology state.
  • Function — a typed compute over ontology data.
  • Datasource — the underlying dataset / stream / API backing an object type.

What’s next

Now that you know what an ontology is and why it exists, the next lesson covers the broader pattern it implements: the semantic layer.

Then we will look at the architecture that makes the three primitives work together.


Happy modeling! 🧭