· AI Engineering · 4 min read
📋 Prerequisites
- Familiarity with LLM APIs
- Basic understanding of tokens and context windows
🎯 What You'll Learn
- Compare Opus, Sonnet, and Haiku across cost, latency, and capability
- Apply the model selection decision framework to real scenarios
- Understand when extended thinking improves results and when it doesn't
- Estimate cost impact of model choice at production scale
Domain 1 Overview
Model selection is the first architectural decision in any Claude application. Domain 1 tests whether you understand the capability differences between tiers deeply enough to make correct trade-off decisions — not just know which model exists.
Exam weight: ~15% (approximately 9 questions)
The Claude Model Tiers
Each Claude generation ships in three tiers. For the Claude 4.x generation:
| Model | Positioning | Context Window | Best For |
|---|---|---|---|
| Claude Opus 4 | Maximum intelligence | 200K tokens | Complex reasoning, research, long-doc analysis, multi-step problems |
| Claude Sonnet 4 | Balanced performance | 200K tokens | Coding, customer-facing apps, summarization, most production workloads |
| Claude Haiku 4 | Speed and efficiency | 200K tokens | High-volume classification, routing, extraction, sub-200ms tasks |
All current models share the same 200K token context window. Model selection is therefore driven by capability, cost, and latency — not context size.
The Cost and Latency Gradient
Across tiers (approximate relative values):
| Opus | Sonnet | Haiku | |
|---|---|---|---|
| Cost per token | Highest (~10x Haiku) | Medium (~4x Haiku) | Lowest (baseline) |
| Time to first token | Slowest | Medium | Fastest (~3x faster than Opus) |
| Reasoning depth | Maximum | High | Good for simple tasks |
The architectural principle the exam tests: Use the cheapest model that reliably solves the task. Sending a classification task to Opus is architecturally incorrect — it is wasteful even if it works.
The Model Selection Decision Framework
Work through these questions in order:
1. What is the task complexity?
- Simple classification, routing, extraction → Haiku
- Coding, summarization, customer interaction → Sonnet
- Multi-step reasoning, research synthesis, long-doc analysis → Opus
2. What is the latency budget?
- Sub-200ms → Haiku only
- Sub-2s acceptable → Sonnet
- Latency not critical → consider Opus for hard tasks
3. What is the volume?
- High volume (thousands/hour) → minimize model tier; cost compounds
- Low volume (occasional) → can afford Opus even for moderate tasks
4. What is the accuracy requirement?
- Near-perfect required, complex domain → Opus + extended thinking
- High but not perfect → Sonnet
- Good enough for classification/routing → Haiku
Extended Thinking
Extended thinking allows Claude to perform internal reasoning before producing a response. It is only available on Opus models.
When extended thinking improves results:
- Multi-step logical deduction
- Security vulnerability analysis
- Complex architecture review
- Problems where auditing the reasoning chain matters
When extended thinking does NOT help (and should not be used):
- Simple Q&A
- Classification
- Content generation
- Any task where latency matters
- Tasks Sonnet already handles reliably
Extended thinking increases both cost (additional reasoning tokens) and latency. The exam tests whether you know the cases where it’s wrong to use it, not just when it’s available.
import anthropic
client = anthropic.Anthropic()
# Extended thinking — use only for genuinely hard reasoning tasks
response = client.messages.create(
model="claude-opus-4-7",
max_tokens=16000,
thinking={
"type": "enabled",
"budget_tokens": 10000,
},
messages=[{
"role": "user",
"content": "Review this system design for security vulnerabilities and single points of failure."
}],
)
for block in response.content:
if block.type == "thinking":
print("Reasoning:", block.thinking)
elif block.type == "text":
print("Answer:", block.text)Model Routing Pattern
A common production pattern is routing requests to the cheapest model that can handle them:
def classify_complexity(user_message: str) -> str:
"""Use a cheap Haiku call to classify request complexity."""
response = client.messages.create(
model="claude-haiku-4-5-20251001",
max_tokens=10,
system="Classify the complexity of the user's request. Respond with exactly one word: simple, medium, or complex.",
messages=[{"role": "user", "content": user_message}],
)
return response.content[0].text.strip().lower()
def route_request(user_message: str) -> str:
complexity = classify_complexity(user_message)
model_map = {
"simple": "claude-haiku-4-5-20251001",
"medium": "claude-sonnet-4-6",
"complex": "claude-opus-4-7",
}
model = model_map.get(complexity, "claude-sonnet-4-6")
response = client.messages.create(
model=model,
max_tokens=2048,
messages=[{"role": "user", "content": user_message}],
)
return response.content[0].textThe exam may ask you to critique this pattern — the main risk is misclassification (a complex task routed to Haiku). Always have a fallback or confidence threshold.
Cost Estimation at Scale
The exam includes questions about cost impact of model choice. Practice this mental math:
Scenario: 50,000 requests/day, average 1,000 input tokens + 500 output tokens per request.
At approximate prices (Sonnet: $3 input / $15 output per million tokens):
- Daily input cost: 50,000 × 1,000 × $0.000003 = $150/day
- Daily output cost: 50,000 × 500 × $0.000015 = $375/day
- Total: $525/day on Sonnet
Switching to Haiku (~8x cheaper input, ~10x cheaper output):
- Input: $150 / 8 = ~$19/day
- Output: $375 / 10 = ~$37/day
- Total: ~$56/day
For a classification workload, Haiku saves ~$469/day. The exam expects you to recognize this magnitude of difference and recommend accordingly.
Domain 1 Key Facts to Memorize
- All Claude 4.x models: 200K token context window
- Haiku: fastest, cheapest; Opus: most capable, most expensive
- Extended thinking: Opus only, increases cost and latency
- Model selection principle: cheapest model that reliably solves the task
- Routing pattern risk: misclassification sends hard tasks to a weak model
Continue to the Domain 1 Practice Questions to test your knowledge.