Press ESC to exit fullscreen
📖 Lesson ⏱️ 90 minutes

Feature Stores: What They Are and When to Use Them

Introduction to feature stores and their role in ML pipelines

The Duplicated Feature Problem

Your data science team has grown. There are four ML models in production:

  • Churn Model (Team A): Uses user_7day_purchase_count — computed as the number of distinct purchase events in the last 7 days
  • Recommendation Model (Team B): Uses user_weekly_purchases — computed as the sum of purchase amounts in the last 7 days
  • Fraud Model (Team C): Uses user_recent_purchase_count — computed as the number of purchases in the last 168 hours

These are all the same feature defined three different ways, computed by three different ETL jobs, stored in three different tables, with subtly different semantics (count vs sum, 7 days vs 168 hours, distinct events vs all events).

When Team B’s ETL job breaks, their model silently starts using stale data. Nobody knows for three days because the feature name is different. When a business analyst asks “how many purchases did this user make this week?”, they get three different answers depending on which model’s feature table they query.

This is the feature duplication problem, and it’s what feature stores solve.


What is a Feature Store?

A feature store is a centralized system for creating, storing, and serving ML features. It has three jobs:

  1. Define features once: The feature user_7day_purchase_count is defined in one place with one formula. All models use the same definition.

  2. Serve features in two modes:

    • Offline (for training): Retrieve historical feature values for a set of entity IDs and timestamps. “Give me the user_7day_purchase_count for users 1001, 1002, 1003 as of January 1st, 2023.”
    • Online (for inference): Retrieve the latest feature values in milliseconds. “Give me the current user_7day_purchase_count for user 1001.”
  3. Prevent training-serving skew: The same feature computation runs for both training and inference. No more subtle differences between offline and online feature calculation.


The Training-Serving Skew Problem

Even without duplication, there’s a more subtle problem: training-serving skew.

You compute features one way during training (from a batch SQL job over historical data) and a different way during serving (from a real-time Redis lookup). The results should be identical, but they often aren’t due to:

  • Different time windows (training uses UTC midnight, serving uses “last 7 days from now”)
  • Different null handling
  • Different aggregation logic

A model trained on features computed one way will perform differently when deployed with features computed another way — even if the team intended them to be the same. This is one of the hardest bugs to diagnose in production ML.

A feature store solves this by having a single feature definition that powers both the offline store (for training data) and the online store (for real-time serving).


Online Store vs Offline Store

Offline Store

The offline store is a data warehouse or data lake (Redshift, BigQuery, Snowflake, S3 + Parquet) that stores historical feature values. You use it to:

  • Build training datasets: retrieve features as they existed at specific timestamps for historical events
  • Run batch inference: score all customers overnight
  • Conduct point-in-time joins (more on this below)

Key property: point-in-time correctness. When building a training dataset for a churn event on January 15th, you want the feature values as they existed on January 15th — not the values as they exist today. Without point-in-time correctness, you accidentally use “future” information in training, producing a model that can’t generalize.

Online Store

The online store is a low-latency key-value store (Redis, DynamoDB, Cassandra) that holds the latest feature values for each entity. You use it for real-time inference.

Typical read latency: 1-10ms. This is what makes sub-100ms ML inference APIs possible.

Training request:
  User 1001, as of 2023-01-15 → [offline store] → historical feature values

Serving request:
  User 1001, right now → [online store] → current feature values (1-5ms)

When Do You Need a Feature Store?

Be honest about whether you need one. A feature store adds complexity and operational overhead. It pays off when:

SignalExplanation
Multiple models using the same featuresWithout a store, each team recomputes features independently
Multiple teamsWithout a store, features are defined inconsistently across teams
Real-time featuresWithout a store, serving latency requirements force you to rebuild online feature infrastructure anyway
Training-serving skew issuesIf you’ve been burned by this, a feature store fixes it structurally
Regulatory complianceFeature stores log what features were used for each prediction

You probably don’t need one when:

  • You have one model and one team
  • All predictions are batch (overnight scoring, no real-time API)
  • Features change rarely and are simple to compute
  • Your team is fewer than 5 data scientists

For the churn model we’ve been building in this course: if it’s a single model and you control all the feature computation, you don’t need a feature store yet. Add it when you have a second model that shares features.


Feast: An Open-Source Feature Store

Feast (Feature Store) is the most widely adopted open-source feature store. It supports:

  • Local development (SQLite for the offline store, SQLite for the online store)
  • AWS (S3 + Redshift offline, DynamoDB online)
  • GCP (BigQuery offline, Firestore online)

Installation

pip install feast

Define Your Feature Repository

# feature_repo/features.py
from datetime import timedelta
from feast import (
    Entity,
    FeatureService,
    FeatureView,
    Field,
    FileSource,
    PushSource,
)
from feast.types import Float32, Int32

# An entity is the thing you're making predictions about
customer = Entity(
    name="customer",
    description="A customer identified by customer_id",
    join_keys=["customer_id"],
)

# Where the raw feature data lives (parquet files for this example)
customer_stats_source = FileSource(
    path="data/feature_store/customer_stats.parquet",
    timestamp_field="event_timestamp",
)

# A feature view defines a group of features computed from a source
customer_stats = FeatureView(
    name="customer_stats",
    entities=[customer],
    ttl=timedelta(days=7),  # Features older than 7 days are stale in the online store
    schema=[
        Field(name="tenure_months", dtype=Int32),
        Field(name="monthly_charges", dtype=Float32),
        Field(name="total_charges", dtype=Float32),
        Field(name="num_products", dtype=Int32),
        Field(name="user_7day_purchase_count", dtype=Int32),
        Field(name="days_since_last_purchase", dtype=Int32),
    ],
    source=customer_stats_source,
    tags={"team": "data-science", "model": "churn"},
)

# A feature service bundles features for a specific model
churn_model_features = FeatureService(
    name="churn_model_v1",
    features=[customer_stats],
    tags={"model_version": "1.0.0"},
)

Initialize and Materialize

# Initialize the feature store (creates registry.db and online_store.db)
feast init churn-features
cd churn-features
feast apply  # Registers entities and feature views

# Materialize features into the online store
# (copies recent features from offline to online for fast serving)
feast materialize-incremental $(date -u +"%Y-%m-%dT%H:%M:%S")

Getting Features for Training

# src/data/get_training_data.py
import pandas as pd
from feast import FeatureStore

store = FeatureStore(repo_path="feature_repo/")

# Entity DataFrame: the events you want to train on
# event_timestamp is when the label occurred
training_events = pd.DataFrame({
    "customer_id": [1001, 1002, 1003, 1004],
    "event_timestamp": pd.to_datetime([
        "2023-01-15", "2023-01-20", "2023-02-01", "2023-02-10"
    ]),
    "churn": [1, 0, 1, 0],  # Labels
})

# Retrieve point-in-time correct features
# For each row, Feast finds the feature values as they existed
# just BEFORE the event_timestamp
training_df = store.get_historical_features(
    entity_df=training_events,
    features=[
        "customer_stats:tenure_months",
        "customer_stats:monthly_charges",
        "customer_stats:total_charges",
        "customer_stats:num_products",
        "customer_stats:user_7day_purchase_count",
    ],
).to_df()

print(training_df.head())
# customer_id | event_timestamp | churn | tenure_months | monthly_charges | ...

Getting Features for Real-Time Serving

# In your FastAPI serving code
from feast import FeatureStore

store = FeatureStore(repo_path="feature_repo/")

def get_online_features(customer_id: int) -> dict:
    """Retrieve latest features for a customer from the online store."""
    feature_vector = store.get_online_features(
        features=[
            "customer_stats:tenure_months",
            "customer_stats:monthly_charges",
            "customer_stats:total_charges",
            "customer_stats:num_products",
            "customer_stats:user_7day_purchase_count",
        ],
        entity_rows=[{"customer_id": customer_id}],
    ).to_dict()

    return {k: v[0] for k, v in feature_vector.items()}

# Usage in FastAPI endpoint
@app.post("/predict")
async def predict(request: ChurnRequest):
    features = get_online_features(request.customer_id)
    prob = state.model.predict_proba([list(features.values())])[0][1]
    return {"churn_probability": prob}

The key insight: the same feature definition powers both training and serving. There is no possibility of training-serving skew.


Feature Engineering Best Practices

Whether or not you use a feature store, these practices apply:

Document feature semantics: user_7day_purchase_count: "Number of distinct purchase events (status='completed') in the 7 calendar days ending at event_timestamp, UTC". Ambiguity is the enemy.

Version features: When you change a feature definition, create a new version (customer_stats_v2). Old models still use the old definition. New models use the new one. Don’t break backward compatibility.

Avoid feature leakage: Never include information that wouldn’t be available at prediction time. days_until_churn is a perfect feature — you will never have it at serving time.

Monitor feature freshness: If a feature hasn’t been updated in 6 hours and it should update every hour, something is broken. Alert on it.


Summary

Feature stores solve the organizational scaling problems of ML:

  • Feature duplication: One definition, used by all models
  • Training-serving skew: Same computation for offline training and online inference
  • Discoverability: New team members can browse available features instead of reinventing them

You probably don’t need one for your first model. You definitely need one by your fifth. The next lesson covers managing the models themselves: once you have multiple model versions, how do you promote, roll back, and track which one is in production?