Course Content
What is MLOps and Why It Matters
From notebook to production — the MLOps lifecycle and why most models never ship
The Problem: A Tale of Two Notebooks
Picture this. You spend three weeks building a customer churn prediction model. The accuracy is 91%. Your manager is impressed. You zip the notebook and email it to your colleague in the data engineering team. They spend two days trying to run it. It fails. Turns out you were using pandas 1.3.5, they have 2.0. Your notebook relies on a CSV in a folder called ~/Downloads/final_data_use_this_v3.csv. The model weights are stored locally and never committed to Git.
Six months later, the model is somehow in production — someone manually uploaded it to a server. Nobody knows how. Predictions start drifting. Customers are complaining. You open your laptop, look at the original notebook, and realize you can’t even remember which preprocessing steps you ran, in which order, or on what version of the data.
This is the story of ML in most organizations. And it’s why 85% of ML projects never reach production (Gartner, 2022).
MLOps exists to solve exactly this.
What is MLOps?
MLOps (Machine Learning Operations) is the set of practices, tools, and culture that takes a machine learning model from an experiment on your laptop to a reliable, monitored, continuously-improving system in production.
It borrows heavily from DevOps (the software engineering discipline for reliable software delivery) and applies those principles to the unique challenges of ML: data dependencies, model decay, non-deterministic behavior, and the separation between the people who build models and the people who operate systems.
Think of MLOps as the difference between:
- Notebook ML: A data scientist trains a model, saves a pickle file, emails it around, and hopes for the best.
- Production ML: A versioned dataset feeds an automated training pipeline. Every run is tracked. The model passes automated quality gates before deployment. Once live, it is continuously monitored and retrained when performance degrades.
Notebook ML vs Production ML
| Dimension | Notebook ML | Production ML |
|---|---|---|
| Data | Local CSV file | Versioned, tracked datasets |
| Experiments | 40 notebooks named final_v2_REAL | Tracked runs with logged params/metrics |
| Deployment | Manual upload | CI/CD pipeline |
| Environment | ”Works on my machine” | Containerized, reproducible |
| Monitoring | None (you find out from users) | Automated drift and performance alerts |
| Retraining | Ad hoc, when someone remembers | Triggered by data drift or schedule |
The core insight is that software engineering has solved these problems already — version control, automated testing, continuous deployment, observability. MLOps applies those solutions to the ML workflow.
The MLOps Lifecycle
Here is the full lifecycle of a production ML system. Every arrow is where things break without MLOps.
Raw Data
|
v
[Data Versioning] <-- DVC, Delta Lake, LakeFS
|
v
[Feature Engineering] <-- Feature Store (Feast, Tecton)
|
v
[Experiment Tracking] <-- MLflow, Weights & Biases
|
v
[Model Training]
|
v
[Automated Testing / CI] <-- GitHub Actions, Jenkins
|
v
[Model Registry] <-- MLflow Registry, SageMaker Model Registry
|
v
[Model Serving] <-- FastAPI, Triton, BentoML, SageMaker
|
v
[Monitoring & Drift Detection] <-- Evidently, Grafana, Prometheus
|
v
[Retraining Trigger] <-- back to topLet’s walk through each stage.
1. Data Versioning
Your model is only as good as the data it was trained on. If you cannot reproduce the exact dataset used for training, you cannot debug issues, compare experiments, or audit model behavior.
Tools like DVC (Data Version Control) solve this by treating datasets like code — you commit a pointer to the data in Git, and the actual data lives in blob storage (S3, GCS). Anyone with access can check out the exact dataset used in any historical run.
2. Experiment Tracking
Training a model involves dozens of decisions: learning rate, regularization, feature selection, architecture. Without tracking, you will lose these experiments. You’ll end up re-running experiments you already ran, or deploying a model without knowing what hyperparameters produced it.
MLflow tracks every run automatically: parameters, metrics, artifacts, and the code version that produced them. You get a UI to compare runs side-by-side.
3. CI/CD for ML
In software engineering, every code change triggers an automated pipeline: build, test, deploy. ML needs the same thing. When a data scientist pushes new training code, the CI pipeline should:
- Validate the input data schema
- Run unit tests on preprocessing logic
- Train the model
- Evaluate it against a quality threshold
- Register it if it passes
If the model doesn’t hit 85% accuracy, the pipeline fails and nobody merges the PR. This catches regressions before they reach users.
4. Model Serving
Once a model passes CI, it needs to be exposed as an API. The serving layer handles incoming requests, preprocesses them the same way the model was trained, runs inference, and returns predictions.
FastAPI is a popular choice for simple cases. For high-throughput production use, teams use NVIDIA Triton, TorchServe, or managed services like AWS SageMaker.
5. Monitoring and Drift Detection
This is where most teams cut corners — and where silent failures happen.
Data drift: The distribution of inputs changes over time. Your fraud model was trained when most transactions were under $500. Now the average is $1,200. The model has never seen these values.
Concept drift: The relationship between inputs and the target changes. A recommendation model trained pre-COVID recommends travel content. Post-COVID, users want home-improvement content. The input features look the same; the correct outputs are completely different.
You need automated checks that compare the distribution of live traffic against your training data, and alert you when divergence exceeds a threshold.
6. Automated Retraining
The final piece: when monitoring detects drift or performance degradation, the pipeline automatically kicks off a new training run on fresh data, evaluates the new model, and (if it passes quality gates) promotes it to production. The cycle restarts.
Why This Matters: The Real Cost of Ignoring MLOps
Consider what happens when you skip MLOps:
- Reproducibility failures: “We can’t replicate the model we deployed six months ago.”
- Shadow models: Nobody knows which model is in production. There are three different pickle files on the server.
- Silent degradation: Your model’s accuracy has been dropping for two months. You find out when a major client churns.
- Deployment bottlenecks: Every model deployment requires a two-week manual process involving three teams.
- Regulatory risk: In finance and healthcare, you must be able to explain and reproduce any prediction your model made. Without data and experiment versioning, this is impossible.
The 85% statistic isn’t just about technical difficulty. It’s about the operational infrastructure that most teams never build. MLOps is that infrastructure.
Who Does What in an MLOps Team?
| Role | Responsibilities |
|---|---|
| Data Scientist | Experiments, feature engineering, model architecture |
| ML Engineer | Training pipelines, feature stores, experiment tracking |
| Platform Engineer | Kubernetes, CI/CD, infrastructure, monitoring |
| Data Engineer | Data pipelines, data quality, storage |
In smaller teams, one or two people wear all these hats. That’s fine — but you still need the practices, even if you’re the only person on the team.
Your MLOps Maturity Level
Teams typically move through these levels:
Level 0 — Manual: Notebooks, manual deployment, no tracking. Most data science projects start here.
Level 1 — ML Pipeline Automation: Training is automated and triggered. Experiments are tracked. Data is versioned.
Level 2 — CI/CD Pipeline Automation: Every code change triggers a full pipeline. Models are automatically evaluated and deployed if they pass gates. Monitoring is in place.
Getting from Level 0 to Level 1 is a weekend of work. Getting from Level 1 to Level 2 takes a few months. Both are worthwhile.
What You Will Build in This Course
Over the next ten lessons, you will build a complete MLOps pipeline from scratch. By the end, you will have:
- A structured, reproducible ML project with DVC
- Experiment tracking with MLflow
- A GitHub Actions CI/CD pipeline that trains and evaluates automatically
- A Dockerized model serving API with FastAPI
- Drift monitoring with Evidently
- A model registry with staging, production, and rollback workflows
Each lesson builds on the previous one. By the capstone, these pieces connect into a single end-to-end system.
The next lesson covers the foundation everything else depends on: how to structure your ML project and version your data.
