Press ESC to exit fullscreen
📖 Lesson ⏱️ 75 minutes

Property Types and Data Types

Strings, integers, geo-points, arrays, structs, attachments — and how typing enables validation and reuse

Why typing matters

Loose typing is the most expensive mistake in data modeling. A weight field that is sometimes pounds, sometimes kilograms, sometimes empty, costs more in production incidents over five years than the entire ontology effort costs to set up.

The ontology pushes types as far down as it can: every property has a declared type, every consumer reads it knowing the shape, and the type system catches misuse before runtime.

Primitive types

Every platform supports the basics. The names vary; the meaning does not:

TypeUse forExample
stringFree text, IDs, codesfirstName, customerId
int / longWhole-number countsseatCount, retryCount
double / floatContinuous measurementsweightKg, priceUsd
booleanTrue/false flagsisActive, requiresSignature
timestampInstant in time (UTC)createdAt, deliveredAt
dateCalendar date with no timebirthDate, policyStartDate
bytesRaw binarysignatureBlob

Use timestamp for events, date for dates. Mixing them is a common bug source.

Avoid string for things that are not strings. A phone number is not a string — it has a format. An email is not a string — it has structure. Reach for semantic types where the platform supports them.

Semantic types

Semantic types are primitives with meaning attached. They are checked, formatted, and rendered specially:

Semantic typeBacked byWhy it matters
Moneydouble + currencyCurrency mismatches caught at compile time
LatLng / GeoPointdouble, doubleMap rendering, distance queries, geo indexes
Polygon / GeoShapeGeoJSONSpatial joins (“which region contains this point?”)
EmailAddressstring with RFC-5322 validationCatches malformed emails at ingestion
PhoneNumberstring + regionE.164 normalization
URLvalidated stringUI can render as a link
Durationlong msMath without unit confusion

Always prefer the semantic type if your platform supports it. Money with explicit currency is always better than priceUsd: double.

Enums

Enums turn unbounded strings into a closed set of allowed values. Use them whenever the set is small, known, and stable-ish.

ShipmentStatus:
  values:
    - created
    - in_transit
    - out_for_delivery
    - delivered
    - exception
    - cancelled

Design rules:

  1. Lowercase, snake_case values — they appear in URLs, logs, and indexes.
  2. Stable values. Renaming an enum value is a breaking change. The display label can change freely; the API value cannot.
  3. Reserve unknown when the closed-world assumption is risky. Better to have unknown than to wedge stray values.
  4. Document the lifecycle — what transitions are legal? That belongs near the enum definition.

Anti-pattern: stuffing two concepts into one enum.

# BAD - mixes status and exception reason
ShipmentStatus:
  values: [created, in_transit, delivered, lost, damaged, customs_held]

# GOOD - separate concerns
ShipmentStatus:
  values: [created, in_transit, delivered, exception]
ExceptionReason:
  values: [lost, damaged, customs_held, address_invalid]

Shipment.status and Shipment.exceptionReason can now evolve independently.

Structs

A struct is a property whose value is itself a typed record:

properties:
  - name: dimensions
    type: struct
    fields:
      - { name: length, type: double }
      - { name: width,  type: double }
      - { name: height, type: double }
      - { name: unit,   type: enum<LengthUnit> }

Use structs when fields belong together and have no meaning apart from each other. Splitting dimensions into length, width, height at the top level invites bugs: someone reads three of them, forgets the unit, mixes inches with centimeters.

When to prefer a separate object type instead of a struct: if the struct has its own identity, can be referenced from multiple places, or needs to be queried independently. Address is on the edge — small businesses keep it as a struct; large ones promote it to an object type so they can deduplicate.

Arrays

A property can be a typed array:

- name: tags
  type: array<string>
- name: stopOverHubIds
  type: array<string>
- name: hazardCodes
  type: array<enum<HazardCode>>

Watch for:

  • Arrays of mutable items — hard to update one entry without rewriting the whole array. If you find yourself wanting that, model the items as a linked object type.
  • Order-significance — does the order matter? Document it. route: array<string> (origin → stop → destination) is order-significant. tags: array<string> is not.
  • Unbounded growth — an array property growing into the millions is a sign you need a link to a separate object type.

Attachments

Attachments are a special property type for files referenced by an object:

- name: proofOfDeliveryPhoto
  type: attachment
  contentTypes: ["image/jpeg", "image/png"]
  maxSizeBytes: 5242880   # 5MB

The file content lives in object storage; the property holds a reference. The ontology layer takes care of:

  • Streaming the file when requested.
  • Enforcing content type and size limits.
  • Applying the same security policies as any other property.

Use attachments for proofs, contracts, photos, generated PDFs — anything binary that belongs to a specific object instance.

Nullability

Every property is nullable or non-nullable. Choose deliberately.

  • Non-nullable is the default to aim for. It is a contract: every consumer can assume the value is present.
  • Nullable is for properties that legitimately may not have a value yet (Shipment.deliveredAt before delivery) or do not apply to every instance.

The mistake to avoid: making everything nullable “just in case.” That pushes null-handling onto every consumer and erodes the value of typing.

If a field is nullable, why must be obvious from context or stated in the description.

Validation

Properties can carry validation rules beyond their type. Common ones:

- name: customerId
  type: string
  validation:
    pattern: "^cust_[a-zA-Z0-9]{8,16}$"

- name: weightKg
  type: double
  validation:
    min: 0
    max: 50000          # nothing heavier than 50 tons in our system

- name: emailAddress
  type: EmailAddress    # built-in semantic type does this for you

- name: scheduledAt
  type: timestamp
  validation:
    notInPast: true

Validation at the property level is enforced on every write — through actions, through ingestion, through APIs. Catching bad data at the boundary keeps the rest of the model honest.

Property descriptions — write them

A property without a description is a debt. The cost shows up the third or fourth time someone asks “is this in seconds or milliseconds?”

Good descriptions answer:

  • Units, if not obvious from the name.
  • Source — where does this value come from?
  • Nullability rationale — when and why is this null?
  • Edge cases — known exceptions.

Example:

- name: deliveredAt
  type: timestamp
  nullable: true
  description: >
    UTC timestamp when the recipient signed for the shipment.
    Null until delivery is confirmed through the markDelivered action.
    For exceptions (lost, damaged) this remains null even when the
    shipment lifecycle has ended.

A future engineer reading this will not have to guess what null means.

A worked example — Customer revisited

Putting the typing lessons together:

objectType: Customer
properties:
  - name: customerId
    type: string
    validation: { pattern: "^cust_[a-zA-Z0-9]{8,16}$" }
    description: Stable identifier minted at signup.

  - name: companyName
    type: string

  - name: primaryEmail
    type: EmailAddress

  - name: billingAddress
    type: struct
    fields:
      - { name: line1, type: string }
      - { name: line2, type: string, nullable: true }
      - { name: city,  type: string }
      - { name: region, type: string }
      - { name: postalCode, type: string }
      - { name: country, type: enum<CountryCode> }

  - name: headquartersLocation
    type: GeoPoint
    nullable: true

  - name: tier
    type: enum<CustomerTier>     # bronze | silver | gold | platinum

  - name: tags
    type: array<string>

  - name: signedAt
    type: timestamp

  - name: lifetimeValueUsd
    type: Money
    derived: true
    description: Computed by function customerLifetimeValue(customer)

Every property is typed, validated where it matters, scoped, and documented.

Key takeaways

  • The ontology’s type system is your single biggest leverage point for data quality.
  • Prefer semantic types (Money, EmailAddress, GeoPoint) over raw primitives.
  • Use enums for closed sets; reserve unknown if the world is not closed.
  • Structs group fields that belong together; arrays model collections; attachments model files.
  • Choose nullability deliberately, and always describe properties so future readers do not guess.

What’s next

You now have well-typed objects sitting in isolation. The next lesson connects them: link types — how object types relate to each other, and how to model relationships that hold up over time.


Strongly typed. Strongly opinionated. 💪