The Semantic Layer

What is a semantic layer?

A semantic layer is a single, authoritative definition of the business concepts in your data — sitting between raw storage and every consumer.

If raw data is the physical layer (rows, columns, files) and applications are the experience layer (charts, forms, agents), the semantic layer is what gives those physical bytes meaning that every consumer can rely on.

Before semantic layers became a recognized pattern, the meaning of data lived in three places at once:

In SQL views — different in every project.
In BI tool definitions — Looker LookML, Tableau calculated fields, Power BI measures.
In application code — business logic encoded across services.

Three sources of truth means no source of truth. Drift is inevitable.

What a semantic layer must guarantee

To be useful, a semantic layer must provide four things:

1. Shared definitions

active_customer means the same thing in finance, marketing, and the operational app. It is defined once, in the semantic layer, and reused everywhere.

2. Typed contracts

Order.totalAmount is a Money value with a currency, not just a number. Consumers know the shape, units, nullability, and constraints statically, not by reading docs.

3. Composability

Definitions build on definitions. customer_lifetime_value can reference Customer, Order, and Refund without re-deriving any of them.

4. Governance

Who can see which property? Who can write? What is audited? Governance is part of the model, not bolted on afterward.

Two flavors of semantic layer

There are two broad approaches in the market:

Flavor A — The metrics layer

Tools like dbt Semantic Layer, Cube, LookML, and Malloy. The focus is definitions of metrics and dimensions that compile down to SQL.

Strengths: Excellent for analytics, easy to integrate with BI tools, leverages existing warehouse.
Limitations: Mostly read-only, analytical. No native concept of writing back. No first-class entities — everything ultimately resolves to SQL over the warehouse.

Flavor B — The ontology

Tools like Palantir Foundry’s Ontology, Microsoft Power Platform’s Dataverse, and increasingly several open-source efforts. The focus is operational entities with first-class read and write.

Strengths: Supports applications, AI agents, and humans operating on the same model. Actions are first-class. Strong typing across the whole stack.
Limitations: Higher up-front modeling cost. Requires committing to a platform-level abstraction.

The ontology approach is a superset of the metrics layer: every metric is just a function over object types and links. The reverse is not true — you cannot model assignDriverToShipment in a metrics layer.

Why “ontology” and not just “data model”?

The word ontology comes from philosophy — the study of what exists. In knowledge representation, an ontology is a formal specification of the entities, properties, and relations in a domain.

The word matters because a regular data model describes how data is stored. An ontology describes what is true about the world the data represents. The two often diverge:

Stored: users table with 14M rows, 70% of which are bots or test accounts.
Ontology: Customer — must be an active, KYC-verified human entity. Filtered, typed, governed.

The ontology is a model of reality, not a model of your tables.

What makes a good semantic layer

A few principles separate semantic layers that scale from those that collapse under their own weight:

Entity-centric, not query-centric. Model the nouns and verbs of the business, not the queries you happen to need this quarter.

Stable identity. Every object has a stable, business-meaningful primary key — customer_id that survives database migrations and reorgs, not a surrogate key that changes on every reload.

Typed everything. A Money value, a LatLong location, an EmailAddress — these are types, not strings. The type system catches errors that runtime never will.

Backed by reality, not derived from reports. Object types should be backed by operational datasources where possible. If your Customer is derived from a quarterly export, you have a snapshot, not a semantic layer.

Versioned. You will get the model wrong. Plan for that — branch, evolve, deprecate, migrate.

A short story

A retail company we will call Northwind had 8 teams building dashboards. The number “monthly active customers” appeared on 22 dashboards with 22 different definitions — none of them wrong, all of them different.

A small team built an Customer object type in their ontology:

One primary key: customer_id from the operational DB.
One lastActiveAt property, computed from the events stream.
One function isActiveAsOf(date): boolean.

The 22 definitions did not disappear — but every new dashboard, app, and agent built on the ontology shared the same one. Six months later, 19 of the 22 dashboards had been retired or migrated to the ontology definition. The drift stopped because the source of truth had moved.

That migration is what adopting a semantic layer looks like in practice.

Key takeaways

The semantic layer is a single source of business meaning, between data and consumers.
It must provide shared definitions, typed contracts, composability, and governance.
Metrics layers cover analytics; ontology-style semantic layers cover analytics and operations.
The value of the layer compounds — every new consumer that joins inherits the work already done.

What’s next

Next we look at the architecture of an ontology — how object types, link types, and action types compose at runtime, and how datasources flow up into them.

Onward. 🧱

Course Content