Course Content
Designing the Semantic Layer
Object types, link types, and actions for the customer's domain — the FDE's most leveraged design decision
The job of the semantic layer
You have data flowing in from data plumbing. You have a sketch of a domain model from domain capture. The semantic layer is the typed surface that sits between them, exposed to every app, dashboard, agent, and analyst as the single representation of the business.
For deep grounding in the primitives (object types, link types, actions, properties, datasources) the Ontology Builder course is the authoritative reference. This lesson is about the FDE-specific design calls — the decisions you make in a customer engagement, under time pressure, with imperfect data, that determine whether your semantic layer is something the customer’s team can keep building on after you leave.
Get this lesson’s calls right and the rest of the engagement compounds. Get them wrong and you ship apps on top of an unstable foundation, and your hand-off (Phase 5) is a nightmare.
Two layers, one mental model
A useful framing the customer’s team should also internalize:
┌──────────────────────────────────────┐
│ Apps · Dashboards · Agents │
├──────────────────────────────────────┤
│ SEMANTIC LAYER (ontology) │ ← typed, governed, stable
│ Object types · Links · Actions │
├──────────────────────────────────────┤
│ Data layer: tables, files, streams │ ← messy, changing, source-shaped
└──────────────────────────────────────┘- The data layer is the world as the source systems happen to organize it. SAP’s table layouts, the GPS vendor’s JSON shape, the CSV columns the SFTP drop happens to contain today.
- The semantic layer is the world as the business talks about it.
Load,Stop,Driver,Hub— with the names and types that match how operators reason, not how vendors store.
Your job as an FDE is to make sure those two layers are coupled (the semantic layer is populated from the data layer) but not identical (the semantic layer is shaped to the business, not to the source).
The single biggest mistake newer FDEs make is letting the data layer leak upward — naming object types after table names, surfacing source-system column names in the UI, exposing nullable fields to operators just because the source happens to be nullable.
The five FDE-specific design calls
Five decisions you face in week 3 that determine the health of the semantic layer six months out.
Call 1 — Identifiers
Every object type needs a primary key. The naive move is to use whatever ID the source system gives you. Sometimes that’s right. Often it isn’t.
The questions to ask:
Is the source ID stable across the lifecycle? If a
Loadkeeps the same SAP ID from creation to delivery, fine. If the ID changes when it goes from “tendered” to “accepted” (it sometimes does), you have a problem.Is there exactly one source of truth? If both SAP and the GPS vendor have a “load ID” and they don’t match, neither is your primary key. You either pick one as canonical and map the other, or you mint your own ID.
Will the customer ever see this ID? If operators talk about loads by SAP ID, expose SAP ID. If they talk by their own freight-bill number, expose that. Match the conversation.
For Northbound, the calls:
| Object | Primary key | Why |
|---|---|---|
Load | sap_load_id | SAP is canonical; everyone refers to loads by this number |
Driver | driver_employee_id | HR system owns this and it doesn’t change |
Tractor | tractor_unit_number | Painted on the side of the truck — used in every conversation |
Stop | synthetic uuid | Stops have no business ID; we generate one |
GPSPing | (tractor_id, observed_at) | Composite key; pings are immutable events |
Call 2 — Time and timezones
Every business runs on time. Every customer system stores it differently. Every FDE engagement has at least one time-related bug.
Decisions to make at the semantic layer:
- All timestamps stored in UTC. Always. No exceptions. Convert at ingest, convert again at display.
- One timezone is canonical for “the business day.” For Northbound it’s
America/Chicagobecause Doug’s hub is in Chicago and the morning batch ritual is anchored there. Document this. - Make “is this stale?” cheap to ask. Every object type that depends on a periodically-refreshed source should expose a
last_synced_atfield on the type itself, not buried in metadata.
Less obvious: the semantic layer should have a few canonical date properties (the business day of a delivery, not a timestamp) alongside the full timestamps. Operators ask “did we deliver on Tuesday?” — that needs to be a query, not a calculation.
Call 3 — Change data and history
The single biggest failure mode of customer ontologies is modeling the present and forgetting the past.
SAP stores today’s Load.status = IN_TRANSIT. Yesterday it was ACCEPTED. Last week it was TENDERED. Where does that history live in your ontology?
Three options:
| Option | When to use | Cost |
|---|---|---|
| Overwrite (current state only) | The history is irrelevant or lives reliably in the source | Cheap; lose history forever |
| Slowly Changing Dimension (versioned rows) | History matters but is occasional | Medium; queries get more complex |
| Event stream (every change is an event) | History is the product; you need to query at any past moment | Expensive but powerful |
For Northbound iteration 1 — overwrite is fine for most things, except Load.status. Status transitions are exactly the kind of audit trail the ops team and the customer’s compliance team will want. So you model status as both:
Load.current_status— a denormalized property for fast queryingLoadStatusChange— its own object type, an event per transition, withfrom,to,at,by
The denormalized property keeps app code simple. The event type keeps the audit trail honest. Both populated from the same source.
This is the kind of call senior FDEs make almost automatically. Junior ones model only the current state and learn the hard way in week 7 when the customer asks “when did this load go in transit?”
Call 4 — Derived state vs. stored state
Some properties are derived from other data: ETA delta, on-time rate, capacity utilization. Two choices for each:
- Compute on read — a function on the ontology. Always fresh; can be expensive.
- Materialize as a property — refresh on a schedule. Cheap to read; can be stale.
The FDE call is usually: operator-facing derived values get materialized, with the source-time and freshness exposed. Analyst-facing derived values stay as functions. This way dashboards stay snappy, exploratory queries stay flexible.
For Northbound:
Load.eta_delta_minutes— materialized, refreshed every 5 min, alongsideeta_calculated_at. Maria sees this in her morning view; she needs it instantly.OnTimeDeliveryRate(week)— computed as a function. Run on demand from a dashboard or report.
Call 5 — Action design
Actions are typed mutations. Designing them well is where FDEs differentiate themselves.
Three rules:
Actions are verbs, not CRUD
markStopComplete is an action. updateStop is not. An action should describe a business event, not a generic mutation. The customer’s compliance and audit teams will thank you (and themselves) for this.
Actions are idempotent where possible
Re-running the action with the same parameters either does nothing (because it already happened) or surfaces the conflict explicitly. This is how you survive double-clicks, network retries, and webhook replays.
Actions are typed end-to-end
Every parameter has a declared type. Every effect is enumerated. Every validation is named. No “we’ll handle the edge case later” — actions are how state changes, and bad actions corrupt the model permanently.
A Northbound action might look like:
action assignDriverToStops({
driver_id: Driver.id,
stop_ids: Stop.id[],
tractor_id: Tractor.id,
trailer_id: Trailer.id,
assignment_starts_at: Timestamp,
reason: string, // free-text, surfaced in audit
}) {
// Preconditions
require(driver.is_active, "Driver must be active");
require(stops.all(s => s.unassigned), "Stops must not have an existing assignment");
require(tractor.is_available_at(assignment_starts_at), "Tractor must be free at start time");
require(trailer.is_available_at(assignment_starts_at), "Trailer must be free at start time");
// Effect
create DriverAssignment {
driver, tractor, trailer,
assignment_starts_at,
stops: stop_ids,
reason,
assigned_by: current_user,
assigned_at: now,
}
// Audit
emit StopAssigned for each stop_id
}What this gives you that an ad-hoc table update would not:
- A typed parameter set every caller (app, agent, script) must satisfy
- Validations named, ordered, and testable
- A single, atomic effect with an audit trail
- Compose-ability — this action is callable from the dispatcher’s app, the AI assistant, and a CLI replay script, identically
Wiring the layer to plumbing
The semantic layer reads from datasources you defined in the plumbing lesson. The mapping is where the abstraction earns its keep.
For each object type, you specify:
object_type: Load
primary_source: sap_loads_daily # from the SFTP drop
property_mappings:
sap_load_id: SOURCE.load_id
customer_id: SOURCE.cust_id # rename to match domain
origin_hub_id: SOURCE.orig_terminal # rename and re-key
current_status: SOURCE.status (enum LoadStatus)
planned_pickup_at: parse_utc(SOURCE.planned_pickup, tz="America/Chicago")
weight_kg: SOURCE.weight_lbs * 0.453592 # unit-convert at the boundary
enrichments:
- join: tractor_position
source: gps_pings_latest
on: load.assigned_tractor_id = source.tractor_unit_number
extract:
live_position: point(source.lat, source.lon)
last_position_at: source.observed_atThree principles encoded above:
- Rename at the boundary. Source names stay at the source. Domain names live in the ontology. The mapping is the only place both appear.
- Convert units at the boundary. Pounds → kilograms, local time → UTC, status string → enum. The semantic layer is canonicalized; the data layer is not.
- Enrich, don’t denormalize blindly. Joining GPS data into
Loadis an enrichment; rolling up all of customer-history intoLoadis denormalization gone wrong. Enrichments stay narrow.
The freshness problem
Every property of every object has an implied freshness. In the data layer this is muddled. In the semantic layer you make it explicit.
Two patterns to deploy:
Per-property freshness
Every property carries (or can be queried for) the timestamp of its underlying source pull.
Load.current_status freshness: 8h (last SAP export)
Load.live_position freshness: 4m (last GPS poll)
Load.planned_pickup_at freshness: 8h (last SAP export)
Load.weight_kg freshness: 8h (last SAP export)When an app displays a value, it can ask the ontology how fresh that value is and surface it (more on this in Phase 4).
Per-object freshness budget
Each object type declares an expected freshness budget. When the budget is exceeded, queries against the object emit a warning or refuse to serve operational use cases.
Load:
expected_freshness: 6 hours (1.5x the daily SAP export cadence)
alert_threshold: 12 hours
refuse_threshold: 24 hours
GPSPing:
expected_freshness: 10 minutes
alert_threshold: 30 minutes
refuse_threshold: 4 hoursThis is one of those design decisions that costs nothing to add now and saves the customer hours of incident debugging in production.
The customer-team handoff design
A subtle but crucial design call: design the semantic layer so the customer’s team can extend it after you leave.
This means:
- Use clear, business-aligned names that the customer’s team would use even without your prompt.
- Group object types by domain area (
dispatch.*,finance.*,hr.*) so adding a new area is obvious. - Document every type with its glossary entry, source datasource, and ownership.
- Avoid clever abstractions that only you understand. The customer’s senior engineer should be able to add a new property in an hour without consulting you.
A signal you are doing this well: when the customer’s engineer asks “where would I add Load.is_hazmat?” they can answer the question themselves after looking at the ontology for 30 seconds.
A v1 Northbound semantic layer
Here’s the actual committed semantic layer for the iteration-1 MVP. Spare, deliberate, and built to extend.
NAMESPACE: dispatch
Object types:
Hub
properties: hub_code (PK), name, address, geo_point, time_zone
source: hubs_static (manual CSV)
refresh: quarterly
Customer
properties: cust_id (PK), name, account_type
source: sap_customers_daily
refresh: daily 02:00
Load
properties:
sap_load_id (PK), customer (→Customer),
origin_hub (→Hub), destination_hub (→Hub),
planned_pickup_at, planned_delivery_at, planned_weight_kg,
current_status (enum LoadStatus),
live_eta_delta_min (derived, materialized 5min),
eta_calculated_at,
last_synced_at
source: sap_loads_daily + gps_pings_latest (enrichment)
refresh: daily for SAP, 5min for GPS
Stop
properties:
stop_id (PK, synthetic UUID), load (→Load), hub (→Hub),
sequence_in_load (int), stop_type (enum PICKUP|DELIVERY),
planned_at, actual_at (nullable until complete)
source: derived from sap_loads_daily
refresh: daily
Driver
properties: driver_employee_id (PK), name, home_hub (→Hub), is_active, license_expires_on
source: hr_drivers_hourly
refresh: hourly
Tractor
properties: tractor_unit_number (PK), model, year, home_hub (→Hub), is_active
source: hr_assets_hourly
refresh: hourly
Trailer
properties: trailer_unit_number (PK), capacity_kg, home_hub (→Hub), is_active
source: hr_assets_hourly
refresh: hourly
DriverAssignment
properties:
assignment_id (PK, synthetic), driver (→Driver), tractor (→Tractor), trailer (→Trailer),
assignment_starts_at, assignment_ends_at (nullable),
stops (→Stop[]), reason, assigned_by, assigned_at
source: ontology actions (no external source)
GPSPing
properties:
composite_id (tractor_unit_number, observed_at),
tractor (→Tractor), lat, lon, speed_kph, observed_at
source: gps_poll_5min
refresh: 5 min
retention: 30 days
LoadStatusChange
properties:
change_id (PK, synthetic), load (→Load),
from_status, to_status, changed_at, changed_by (system|user)
source: derived from sap_loads_daily diff + ontology actions
Link types:
Customer places Load (1:N)
Load has Stop (1:N, ordered by sequence_in_load)
Stop occurs_at Hub (N:1)
DriverAssignment covers Stop (N:M)
DriverAssignment uses_tractor Tractor (N:1)
DriverAssignment uses_trailer Trailer (N:1)
Driver has_assignment DriverAssignment (1:N)
Tractor emits GPSPing (1:N)
Load undergoes LoadStatusChange (1:N)
Action types:
assignDriverToStops(driver_id, stop_ids[], tractor_id, trailer_id, starts_at, reason)
markStopComplete(stop_id, actual_at, signature?, notes?)
reassignStop(stop_id, new_driver_id, reason)
cancelLoad(load_id, reason)
rerouteShipment(load_id, new_stops[], reason)
Functions (compute on read):
onTimeDeliveryRate(window_days) → float
capacityUtilization(hub_id, window_days) → float
loadSlippingNow() → Load[] ← used by Maria's morning view
Freshness budgets:
Load: expected 6h, alert 12h, refuse 24h
GPSPing: expected 10m, alert 30m, refuse 4h
Driver: expected 2h, alert 6h, refuse 24hA few features of this committed layer worth noticing:
- Every object has a documented source and refresh cadence.
- Critical derived values (
Load.live_eta_delta_min) are materialized with theireta_calculated_atnext to them — so any app showing them can show their freshness. - The audit trail (
LoadStatusChange) is a first-class object, not an afterthought. - Actions are scoped to business events, not generic CRUD.
- Functions exist for analyst-facing derived values that don’t need to be precomputed.
What’s deliberately not in v1
A reminder of the MVP scoping discipline applied to the model:
- No
Route(the homonym problem from Phase 2 — still unresolved enough that we leave it out) - No
FreightBill/ billing (finance system; not iteration 1) - No
MaintenanceRecord(interesting for iteration 4+) - No customer-facing tracking (a Phase 5+ consideration)
- No multi-region (Northbound is single-region for now)
- No HAZMAT or specialized cargo handling (later)
Each omission is a deliberate scoping choice, documented in the model rationale.
The migration plan from day one
Every committed semantic layer becomes a series of versioned migrations. Even iteration 1. The discipline:
- Every change to the layer is a named migration with a date
- Renames preserve the old name as an alias for one version
- Type widenings (string → larger string) are silent; type narrowings or splits get migration plans
- Removed object types or properties go through a deprecation window with logged warnings
You will introduce this discipline gently. By the end of iteration 2 you should have a migrations/ directory with two or three entries, even if each one is small. By iteration 6 the customer’s engineers should be writing migrations themselves.
Common failure modes
A short list of design mistakes that have wrecked otherwise-good engagements:
- Source-shaped object types —
SapLoadTableis not an object type;Loadis. Always rename at the boundary. - Surrogate IDs nobody can recognize — using a UUID where a business ID exists; operators can’t read the system.
- Modeling only current state — losing the audit trail; week-7 incident has no debug surface.
- Untyped actions — “we’ll validate in the app.” No. Validate at the action.
- Hidden derived values —
Load.etaexists as a property; nobody knows how it’s computed. Always name and document derivations. - One namespace for everything —
dispatch.*,finance.*,hr.*keeps the model navigable; one flat namespace becomes unreadable at 200 types. - Materialized everything — every property is precomputed. Now refresh latency dominates and incidents are about staleness, not correctness.
Key terms to remember
- Semantic layer — typed surface between data and apps
- Identifier strategy — pick PK based on stability, source-of-truth, and operator familiarity
- Materialized vs computed — choose per property based on read pattern
- Freshness budget — declared expected, alert, refuse thresholds per object type
- Mapping — the boundary between source columns and ontology properties (rename, retype, convert)
- Audit-as-object — make the audit trail a first-class object type, not metadata
What’s next
You have data flowing and a semantic layer to receive it. Real customers have a long tail of other systems your layer must talk to: REST APIs, vendor webhooks, SOAP endpoints, queues, identity providers. The next lesson covers the integration patterns an FDE reaches for to wire those in without the integrations becoming the engagement.