Domain Capture: Turning Conversations into a Model

Why this is the most leveraged thing you do

Once an ontology is in place, every downstream decision becomes easier: which apps to build, which datasets to integrate, which actions need permissions, which workflows can be automated, which questions can be answered. The ontology is a force-multiplier — every hour spent modeling well saves a week of confusion later.

The reverse is also true. A bad ontology gets built on top of, and then the cost of fixing it compounds. Senior FDEs differentiate themselves not by writing more code, but by getting the model right the first time, or close enough that the second iteration is cheap.

This lesson is the craft. Some of it is technique. Some of it is taste.

What a domain model is

In this course we use ontology as shorthand for a typed domain model. Concretely, that means three primitives:

Object types — the nouns of the business (Shipment, Driver, Hub, Customer)
Link types — the verbs between objects (Driver operates Vehicle; Shipment assigned_to Driver)
Action types — the typed mutations of state (markDelivered, reassignShipment)

For a deeper grounding in the primitives themselves, the Ontology Builder course covers them in depth. For this lesson we focus on the FDE-specific question: how do you extract a good model from interview notes and customer conversations?

The capture workflow

Domain capture is not a single meeting. It is a workflow that runs in parallel with discovery, typically over the first two weeks of an engagement. The stages:

Listen — collect raw nouns and verbs from interviews
Cluster — group synonyms, separate homonyms
Sketch — draw the first model on a whiteboard
Validate — walk it back through users and sponsors
Type — commit it to the platform with real types

You will run all five stages multiple times. The first sketch is wrong. So is the second. By the fourth, you have something defensible.

Stage 1 — Listen for nouns and verbs

Open each interview write-up. Highlight every noun that refers to a real-world entity, and every verb that describes a state change. Do not filter yet. Do not consolidate yet. Just list.

From the Northbound interviews you’d extract something like:

Nouns: load, shipment, route, leg, stop, hub, terminal, depot, dock, driver, vehicle, truck, trailer, tractor, customer, account, contract, lane, freight bill, BOL, ETA, GPS event, dispatcher, hub manager, slot, appointment…
Verbs: pick up, drop off, tender, accept, reject, dispatch, depart, arrive, check in, check out, deliver, sign for, reroute, cancel, hold, release, weigh, fuel…

This list looks like noise. That is intentional. The richness of the list — including the synonyms and the contradictions — is exactly what you need to model from.

Stage 2 — Cluster, then split

Two operations:

Clustering synonyms

Different people use different words for the same thing. Load, shipment, freight, and order are all the same entity at Northbound, in different mouths. Pick one canonical name. Write down the synonyms next to it so future you can search them.

Decision rule for the canonical name: prefer the term the operators use, not the executives. The operators will use the system; the executives won’t. If operators consistently call it a load, your object type is Load — not Shipment, even if Shipment sounds more enterprise-ish.

Splitting homonyms

The harder operation. Different people use the same word for different things. This is the most common source of bad ontologies.

A Northbound example: everyone says “route.” It turns out there are three meanings:

Speaker	”Route” means…
Dispatcher	the sequence of stops a single driver will make today
Operations VP	the contracted lane between two cities (e.g. Chicago → Cleveland)
Driver	the road they will physically drive (I-90 vs. I-80)

If you collapse all three into one Route object type, your model fails. The right answer is three separate object types — call them DriverRouteAssignment, Lane, RoadPath — and now downstream conversations about “route” are unambiguous because the speaker has to pick which kind they mean.

The rule of thumb: when in doubt, split. Merging two object types later is cheap. Splitting them later, after both apps and code have committed to the combined version, is expensive.

Stage 3 — Sketch on a whiteboard

You now have a canonical list of object types and a rough sense of how they relate. Time to sketch.

How to sketch

Standing whiteboard, two-color marker, no laptop. Sketches drawn on a laptop look authoritative before they deserve to. Sketches drawn on a whiteboard, with a customer in the room, invite correction.

Your first pass:

Write each object type as a labeled box.
Draw arrows between boxes for the relationships, with verb labels on the arrows.
Note the cardinality on each arrow (one-to-many, many-to-many).
Mark primary key candidates on each object — what uniquely identifies this thing in the customer’s world?

A draft Northbound sketch:

   Customer ──places──▶ Load
                          │
                          │ has-many
                          ▼
                       Stop ◀──occurs-at── Hub
                          ▲
                          │ visited-during
                          │
                  DriverRouteAssignment ──operates──▶ Vehicle
                          │
                          │ assigned-to
                          ▼
                       Driver

Imperfect. That is fine. The sketch is a prompt for the next conversation, not a deliverable.

What to mark on the sketch

For each object type, jot down 3-5 fields. For each link, jot down cardinality and whether it’s mandatory. For each suspicious area (anything that confused you in interviews), put a question mark.

Example:

   Load
   ─────
   • load_id (PK)            ← what's the source-of-truth ID? SAP? Our own?
   • customer_id (FK)
   • origin_hub_id (FK)
   • dest_hub_id (FK)
   • planned_pickup_at
   • planned_delivery_at
   • status                  ← what are the legal values?
   • ??? cargo type
   • ??? weight

Your question marks are your next interview questions.

Stage 4 — Validate

Take your sketch back to the people who fed it to you.

Validate with operators on the workflow

Show Maria the sketch. Walk her through: “So when Doug tells you to reassign a load, that’s this arrow — a new DriverRouteAssignment. Does that match what you do?”

You will hear things like:

“No, we don’t reassign the load. We reassign the stop. The load can have stops on two drivers.”
“There’s actually no such thing as a Vehicle. We have tractors and trailers separately, and they swap around.”
“That arrow is one-to-many, but only if you’re a regional. National accounts work differently.”

Each correction is gold. Update the sketch on the spot, in front of them. Wet ink is more persuasive than dry ink — when the customer sees you redraw their boxes mid-conversation, they trust you to listen.

Validate with sponsors on outcomes

Different question, same sketch. The VP does not care about DriverRouteAssignment. She cares whether her on-time delivery number is a query you can run on this model. Walk her through:

“On-time delivery is a count over Load where actual_delivery_at <= planned_delivery_at.”
“Capacity utilization is a sum over DriverRouteAssignment divided by sum of Vehicle.capacity per day.”

If her questions cannot be answered against the sketch, you are missing object types. If they can, you have a model worth committing to.

Validate with gatekeepers on feasibility

Show IT the sketch. The question is not “does it match the business?” — it’s “can we source the data?”

“We can get Load and Customer from SAP nightly.”
“Driver and Vehicle are in HR’s system, but they don’t sync — we’d need a daily Excel export.”
“Stops aren’t in any system. The dispatchers track them in the spreadsheet.”

Every box that doesn’t have a feasible datasource is a risk. Mark it.

Stage 5 — Type it

Once the sketch survives two rounds of validation, commit it to the platform with real types. This means:

Every property has a typed primitive (string, integer, geo_point, enum, attachment)
Every link has a defined cardinality and a backing implementation (a foreign key, an intersection table, a join)
Every enum has its legal values written down (LoadStatus = OPEN | TENDERED | ACCEPTED | IN_TRANSIT | DELIVERED | CANCELLED)
Primary keys are stable and business-meaningful where possible

This is the first version of your ontology. Check it into version control. From here on, changes are tracked as migrations.

The four common modeling traps

In a decade of FDE work, you will see the same modeling mistakes again and again. Watch for these.

Trap 1 — Modeling the screens, not the world

The customer shows you their SAP screen. You make an object type that mirrors the screen field-for-field. Now your model is shaped like a UI, not like the business — and when they change the SAP screen, your model breaks. Model the world the screen is trying to represent.

Trap 2 — Modeling the org chart, not the work

Every department has a system, so you create one object type per department. You end up with OperationsCustomer, FinanceCustomer, MarketingCustomer — three views of the same person, none of which can talk to each other. Model the underlying entity once and let departments have permissioned views.

Trap 3 — Modeling at the wrong granularity

Too coarse: Activity is a single object type that means anything that ever happens. You can never query usefully against it. Too fine: LoadPickupTimestamp, LoadDropoffTimestamp, LoadHandoffTimestamp are each their own object types instead of being properties of Load or events on it. The right grain is the grain at which the operator naturally talks about it.

Trap 4 — Modeling state that does not exist yet

The VP describes a future workflow. You model the entities for it. But the data does not exist yet, and may never. Model what the customer’s data says exists today. Add the new entities when there is data to back them.

The ‘two whiteboards’ technique

A pattern that helps when the model is genuinely hard:

Set up two whiteboards.

Whiteboard A — “Today”: the world as it exists, with all the data sources and entities that are actually populated.
Whiteboard B — “Tomorrow”: the world as the customer wants it to look in 6 months, with the new entities, new relationships, new actions.

Sketch both. Then trace which arrows from A → B can be drawn in the next 6 weeks. Those become your model. Everything else on B is roadmap material — important, but not your immediate problem.

A worked example — the Northbound first model

After two weeks at Northbound, here’s the first model an FDE might commit to. Note the deliberate restraint — many things the VP mentioned in week 1 are intentionally absent.

Object types (v1):
  • Customer            — buys freight services
  • Load                — a single contracted move from origin to destination
  • Stop                — a pickup or dropoff at a single location
  • Hub                 — a Northbound terminal where loads transit
  • Driver              — a person who operates vehicles
  • Tractor             — the cab
  • Trailer             — the box
  • DriverAssignment    — a driver's planned work for a shift
  • GPSPing             — a single positional event for a vehicle

Link types (v1):
  • Customer  places          → Load            (1-to-many)
  • Load      has             → Stop            (1-to-many)
  • Stop      occurs-at       → Hub             (many-to-1)
  • Driver    has-assignment  → DriverAssignment (1-to-many)
  • DriverAssignment covers   → Stop            (many-to-many)
  • DriverAssignment uses-tractor → Tractor     (many-to-1)
  • DriverAssignment uses-trailer → Trailer     (many-to-1)
  • Tractor   emits           → GPSPing         (1-to-many)

Action types (v1):
  • createLoad(customer_id, stops[]) → Load
  • assignDriverToStops(driver_id, stop_ids[], tractor_id, trailer_id) → DriverAssignment
  • recordGPSPing(tractor_id, lat, lon, observed_at) → GPSPing
  • markStopComplete(stop_id, actual_time, notes?) → void
  • reassignStop(stop_id, new_driver_id, reason) → void

Notice what is not there: no Route (deliberately, given the homonym problem). No FreightBill (the finance team has it, but it does not affect the dispatch workflow we are building first). No MaintenanceRecord (interesting later, irrelevant now).

Restraint is the FDE’s superpower in domain capture. A small model that fits the work beats a comprehensive model that is hard to evolve.

By end of week 2, you should produce three short artifacts:

The sketch — a clean version of the whiteboard, in a diagram tool (Excalidraw, draw.io, or even Figma). Single page.
The glossary — every object type and every synonym, in a one-page table.
The model rationale — a one-pager explaining what you included, what you deliberately excluded, and why.

These three artifacts are what you walk into the week-2 review with. The sponsors approve the rationale; the users approve the glossary; the gatekeepers approve the feasibility implied by the sketch.

Living with the model

Your ontology will not be right on the first try. That is expected. What you owe the customer (and yourself) is:

A documented model at every point in time
Versioned changes as you learn more
A clear story for each change (“we split Route into Lane and DriverRouteAssignment in week 3 after Maria explained the homonym”)

Senior FDEs do not pretend their first model was right. They are proud of the trail of improvements they made and what each one taught the team.

Key terms to remember

Object type / link type / action type — the three primitives
Synonym cluster — different names for the same entity
Homonym split — the same name for different entities (most common modeling trap)
Cardinality — the multiplicity on a link (1:1, 1:N, N:M)
Two whiteboards — separating today’s model from tomorrow’s roadmap
Glossary — the one-page mapping of business words to object types
Model rationale — the written justification for inclusions and exclusions

What’s next

You have a problem statement (from discovery) and a model (from capture). The next question — and the one most engineers most consistently get wrong — is what to build first. That is MVP scoping under ambiguity, the final lesson of Phase 2.

Course Content