Course Content
Memory Systems for Agents
Short-term, long-term, episodic, and semantic memory — when to use each
Why Memory Matters
Every time you start a new conversation with a stateless LLM, it has no idea who you are. It doesn’t know that you prefer Python over Java, that your project deadline is next Friday, or that last month you discussed an architecture decision and chose microservices. Every session starts from zero.
For a chatbot, this is mildly annoying. For an agent, it’s crippling. An agent that helps you manage a software project needs to know the project’s current state, your team’s preferences, past decisions, and what was discussed previously. Without memory, it can’t do its job.
The good news is that memory in AI systems is a solved problem — not perfectly, but practically. There are four distinct memory types, each suited to different use cases. Understanding them lets you build agents that feel genuinely helpful across sessions, not just within a single conversation.
The Four Memory Types
1. In-Context Memory (Working Memory)
Analogy: RAM on a computer
In-context memory is everything currently in the model’s context window — the conversation history, tool results, system prompt, and intermediate reasoning from the current session. It’s immediate, zero-latency, and fully available to the model without any retrieval step.
The limitation is size. Current models have context windows of 128K to 1M tokens — impressive, but finite. A long conversation, a large document, or many tool results can fill it up. And it disappears entirely when the session ends.
# In-context memory is just the messages list
messages = [
{"role": "user", "content": "My name is Alex and I'm building a Django app."},
{"role": "assistant", "content": "Great! I'll remember that. What do you need help with?"},
{"role": "user", "content": "How should I structure my models?"},
# The model "remembers" Alex and Django from earlier in this same context
]Use in-context memory for: the current conversation, immediate task state, temporary reasoning steps.
2. External Memory (Vector Memory)
Analogy: A searchable filing cabinet
External memory is a vector database that stores information as embeddings. When the agent needs to recall something, it generates an embedding of the query and retrieves the most semantically similar stored content.
This is unlimited in size and persists across sessions. The trade-off: retrieval takes a query, there’s a latency cost, and retrieval can miss relevant information if the query isn’t well-formed.
import chromadb
from chromadb.utils import embedding_functions
# Initialize ChromaDB
client = chromadb.PersistentClient(path="./agent_memory")
embedder = embedding_functions.DefaultEmbeddingFunction()
collection = client.get_or_create_collection(
name="agent_memories",
embedding_function=embedder
)
# Store a memory
collection.add(
documents=["User is building a Django app with PostgreSQL. Prefers class-based views."],
ids=["memory_001"],
metadatas=[{"user_id": "alex", "timestamp": "2026-06-01", "type": "preference"}]
)
# Retrieve relevant memories
results = collection.query(
query_texts=["What framework is the user using?"],
n_results=3,
where={"user_id": "alex"}
)
retrieved_context = "\n".join(results["documents"][0])Use external memory for: user preferences, project knowledge, past decisions, reference documents.
3. Episodic Memory
Analogy: A personal diary
Episodic memory stores records of specific past interactions — what was discussed, what decisions were made, what the agent did. Unlike semantic memory (which stores distilled facts), episodic memory preserves the narrative arc of past sessions.
import json
from datetime import datetime
class EpisodicMemory:
def __init__(self, storage_path: str):
self.storage_path = storage_path
self.episodes = self._load()
def _load(self) -> list:
try:
with open(self.storage_path) as f:
return json.load(f)
except FileNotFoundError:
return []
def save_episode(self, user_id: str, summary: str, key_decisions: list):
episode = {
"user_id": user_id,
"timestamp": datetime.now().isoformat(),
"summary": summary,
"key_decisions": key_decisions
}
self.episodes.append(episode)
with open(self.storage_path, "w") as f:
json.dump(self.episodes, f, indent=2)
def get_recent_episodes(self, user_id: str, n: int = 5) -> list:
user_episodes = [e for e in self.episodes if e["user_id"] == user_id]
return sorted(user_episodes, key=lambda x: x["timestamp"], reverse=True)[:n]
# After a session ends, save an episode summary
memory = EpisodicMemory("./episodes.json")
memory.save_episode(
user_id="alex",
summary="Discussed Django model structure for an e-commerce app. Covered Product, Order, and User models.",
key_decisions=[
"Using UUID primary keys instead of integers",
"Storing product images in S3, not local filesystem",
"Decided against soft deletes for simplicity"
]
)At the start of a new session, load recent episodes and inject them into the system prompt:
episodes = memory.get_recent_episodes("alex", n=3)
episode_context = "\n".join([
f"Session {e['timestamp'][:10]}: {e['summary']}"
for e in episodes
])
system_prompt = f"""You are a personal coding assistant for Alex.
Recent session history:
{episode_context}
Continue from where you left off, referencing past decisions when relevant."""Use episodic memory for: picking up where you left off, referencing past decisions, building continuity across sessions.
4. Semantic Memory
Analogy: A cheat sheet of known facts
Semantic memory stores distilled, stable facts about the world or the user — not the raw record of what was said, but the extracted truth. “User prefers Python over Java” is semantic memory. “In session #47, the user said they find Java verbose” is episodic memory.
Semantic memory is extracted from interactions and updated over time. It’s compact, always-available, and immediately useful.
class SemanticMemory:
"""Stores user preferences and facts as key-value pairs."""
def __init__(self, storage_path: str):
self.storage_path = storage_path
self.facts = self._load()
def _load(self) -> dict:
try:
with open(self.storage_path) as f:
return json.load(f)
except FileNotFoundError:
return {}
def update_fact(self, user_id: str, key: str, value: str):
if user_id not in self.facts:
self.facts[user_id] = {}
self.facts[user_id][key] = value
with open(self.storage_path, "w") as f:
json.dump(self.facts, f, indent=2)
def get_user_facts(self, user_id: str) -> dict:
return self.facts.get(user_id, {})
semantic = SemanticMemory("./semantic.json")
semantic.update_fact("alex", "preferred_language", "Python")
semantic.update_fact("alex", "project_type", "Django e-commerce app")
semantic.update_fact("alex", "database", "PostgreSQL")
semantic.update_fact("alex", "deployment", "AWS ECS with Docker")
semantic.update_fact("alex", "deadline", "June 20, 2026")
# Later sessions inject this as context
user_facts = semantic.get_user_facts("alex")
facts_text = "\n".join([f"- {k}: {v}" for k, v in user_facts.items()])Use semantic memory for: persistent user preferences, known constraints, project metadata.
Building a Personal Assistant Agent with Memory
Here’s a complete agent that combines all four memory types to give a persistent, context-aware experience:
import anthropic
import chromadb
from chromadb.utils import embedding_functions
client = anthropic.Anthropic()
class PersonalAssistantAgent:
def __init__(self, user_id: str):
self.user_id = user_id
self.conversation_history = [] # in-context memory
# External vector memory
chroma = chromadb.PersistentClient(path="./agent_memory_db")
self.vector_memory = chroma.get_or_create_collection(
name="memories",
embedding_function=embedding_functions.DefaultEmbeddingFunction()
)
# Semantic memory (simplified as dict)
self.semantic = SemanticMemory("./semantic.json")
self.episodic = EpisodicMemory("./episodes.json")
def _build_system_prompt(self) -> str:
# Load semantic facts
facts = self.semantic.get_user_facts(self.user_id)
facts_text = "\n".join([f"- {k}: {v}" for k, v in facts.items()]) if facts else "None yet."
# Load recent episodes
episodes = self.episodic.get_recent_episodes(self.user_id, n=3)
episodes_text = "\n".join([
f"- {e['timestamp'][:10]}: {e['summary']}"
for e in episodes
]) if episodes else "No previous sessions."
return f"""You are a personal assistant for user {self.user_id}.
Known facts about this user:
{facts_text}
Recent sessions:
{episodes_text}
Use this context to give personalized, consistent help. Reference past work when relevant.
Update your mental model of the user as you learn new things about them."""
def _retrieve_relevant_memories(self, query: str) -> str:
"""Search vector memory for relevant past content."""
try:
results = self.vector_memory.query(
query_texts=[query],
n_results=3,
where={"user_id": self.user_id}
)
if results["documents"][0]:
return "Relevant memories:\n" + "\n".join(results["documents"][0])
except Exception:
pass
return ""
def store_memory(self, content: str, memory_type: str = "general"):
"""Store something in vector memory for later retrieval."""
import uuid
self.vector_memory.add(
documents=[content],
ids=[str(uuid.uuid4())],
metadatas=[{"user_id": self.user_id, "type": memory_type}]
)
def chat(self, user_message: str) -> str:
# Retrieve relevant vector memories
relevant_memories = self._retrieve_relevant_memories(user_message)
# Add user message (optionally prepend retrieved memories)
if relevant_memories:
augmented_message = f"{relevant_memories}\n\nUser: {user_message}"
else:
augmented_message = user_message
self.conversation_history.append({
"role": "user",
"content": augmented_message
})
response = client.messages.create(
model="claude-opus-4-5",
max_tokens=1024,
system=self._build_system_prompt(),
messages=self.conversation_history
)
assistant_message = response.content[0].text
self.conversation_history.append({
"role": "assistant",
"content": assistant_message
})
return assistant_message
def end_session(self, session_summary: str, decisions: list):
"""Called when a session ends — store episodic memory."""
self.episodic.save_episode(self.user_id, session_summary, decisions)
# Also store in vector memory for semantic search
self.store_memory(session_summary, memory_type="episode_summary")
# Usage
agent = PersonalAssistantAgent(user_id="alex")
# First interaction in a new session
print(agent.chat("What were we working on last time?"))
print(agent.chat("Right — let me continue with the Product model. What fields should it have?"))
# End of session
agent.end_session(
session_summary="Completed Product model with UUID pk, name, price, stock_quantity, images (S3 URLs)",
decisions=["Used DecimalField for price", "Added slug field for SEO-friendly URLs"]
)Choosing the Right Memory Type
| Scenario | Memory Type |
|---|---|
| Remember what was said 5 messages ago | In-context |
| Remember user’s language preference across sessions | Semantic |
| Remember what was discussed last Tuesday | Episodic |
| Find notes about a specific topic from months ago | External/Vector |
| Store the entire project codebase for reference | External/Vector |
Most production agents use a combination. A common pattern:
- Always: in-context for the current conversation
- Always: semantic for known user facts injected into the system prompt
- On session start: load last 3 episodic summaries into system prompt
- On query: vector search for relevant past content to prepend to the message
Memory Pitfalls to Avoid
Over-stuffing context: Loading every memory into every request wastes tokens and buries the relevant in the irrelevant. Use selective retrieval — only load what’s relevant to the current query.
Stale semantic facts: If a user changes preferences, semantic memory needs to be updated. Build in an update mechanism, not just a write mechanism.
No memory consolidation: After 100 sessions, episodic memory has 100 entries. Summarize and compress old episodes periodically to keep the relevant history manageable.
Trusting memory blindly: Memory can be wrong or outdated. Build agents that confirm before acting on potentially stale information: “Last time you mentioned deploying to AWS ECS — is that still the plan?”
Summary
- In-context memory is fast but limited to the current session — like RAM
- External vector memory is persistent and searchable — like a filing cabinet
- Episodic memory stores narrative records of past sessions — like a diary
- Semantic memory stores distilled stable facts — like a cheat sheet
- The personal assistant agent combines all four for a truly persistent experience
- Always retrieve selectively — load only what’s relevant to the current query
Next: Building Your First Agent with LangChain — putting tools, memory, and the ReAct pattern together in a real web research agent.
