Memory Systems for Agents

Why Memory Matters

Every time you start a new conversation with a stateless LLM, it has no idea who you are. It doesn’t know that you prefer Python over Java, that your project deadline is next Friday, or that last month you discussed an architecture decision and chose microservices. Every session starts from zero.

For a chatbot, this is mildly annoying. For an agent, it’s crippling. An agent that helps you manage a software project needs to know the project’s current state, your team’s preferences, past decisions, and what was discussed previously. Without memory, it can’t do its job.

The good news is that memory in AI systems is a solved problem — not perfectly, but practically. There are four distinct memory types, each suited to different use cases. Understanding them lets you build agents that feel genuinely helpful across sessions, not just within a single conversation.

The Four Memory Types

1. In-Context Memory (Working Memory)

Analogy: RAM on a computer

In-context memory is everything currently in the model’s context window — the conversation history, tool results, system prompt, and intermediate reasoning from the current session. It’s immediate, zero-latency, and fully available to the model without any retrieval step.

The limitation is size. Current models have context windows of 128K to 1M tokens — impressive, but finite. A long conversation, a large document, or many tool results can fill it up. And it disappears entirely when the session ends.

# In-context memory is just the messages list
messages = [
    {"role": "user", "content": "My name is Alex and I'm building a Django app."},
    {"role": "assistant", "content": "Great! I'll remember that. What do you need help with?"},
    {"role": "user", "content": "How should I structure my models?"},
    # The model "remembers" Alex and Django from earlier in this same context
]

Use in-context memory for: the current conversation, immediate task state, temporary reasoning steps.

2. External Memory (Vector Memory)

Analogy: A searchable filing cabinet

External memory is a vector database that stores information as embeddings. When the agent needs to recall something, it generates an embedding of the query and retrieves the most semantically similar stored content.

This is unlimited in size and persists across sessions. The trade-off: retrieval takes a query, there’s a latency cost, and retrieval can miss relevant information if the query isn’t well-formed.

import chromadb
from chromadb.utils import embedding_functions

# Initialize ChromaDB
client = chromadb.PersistentClient(path="./agent_memory")
embedder = embedding_functions.DefaultEmbeddingFunction()

collection = client.get_or_create_collection(
    name="agent_memories",
    embedding_function=embedder
)

# Store a memory
collection.add(
    documents=["User is building a Django app with PostgreSQL. Prefers class-based views."],
    ids=["memory_001"],
    metadatas=[{"user_id": "alex", "timestamp": "2026-06-01", "type": "preference"}]
)

# Retrieve relevant memories
results = collection.query(
    query_texts=["What framework is the user using?"],
    n_results=3,
    where={"user_id": "alex"}
)

retrieved_context = "\n".join(results["documents"][0])

Use external memory for: user preferences, project knowledge, past decisions, reference documents.

3. Episodic Memory

Analogy: A personal diary

Episodic memory stores records of specific past interactions — what was discussed, what decisions were made, what the agent did. Unlike semantic memory (which stores distilled facts), episodic memory preserves the narrative arc of past sessions.

import json
from datetime import datetime

class EpisodicMemory:
    def __init__(self, storage_path: str):
        self.storage_path = storage_path
        self.episodes = self._load()
    
    def _load(self) -> list:
        try:
            with open(self.storage_path) as f:
                return json.load(f)
        except FileNotFoundError:
            return []
    
    def save_episode(self, user_id: str, summary: str, key_decisions: list):
        episode = {
            "user_id": user_id,
            "timestamp": datetime.now().isoformat(),
            "summary": summary,
            "key_decisions": key_decisions
        }
        self.episodes.append(episode)
        with open(self.storage_path, "w") as f:
            json.dump(self.episodes, f, indent=2)
    
    def get_recent_episodes(self, user_id: str, n: int = 5) -> list:
        user_episodes = [e for e in self.episodes if e["user_id"] == user_id]
        return sorted(user_episodes, key=lambda x: x["timestamp"], reverse=True)[:n]


# After a session ends, save an episode summary
memory = EpisodicMemory("./episodes.json")
memory.save_episode(
    user_id="alex",
    summary="Discussed Django model structure for an e-commerce app. Covered Product, Order, and User models.",
    key_decisions=[
        "Using UUID primary keys instead of integers",
        "Storing product images in S3, not local filesystem",
        "Decided against soft deletes for simplicity"
    ]
)

At the start of a new session, load recent episodes and inject them into the system prompt:

episodes = memory.get_recent_episodes("alex", n=3)
episode_context = "\n".join([
    f"Session {e['timestamp'][:10]}: {e['summary']}"
    for e in episodes
])

system_prompt = f"""You are a personal coding assistant for Alex.

Recent session history:
{episode_context}

Continue from where you left off, referencing past decisions when relevant."""

Use episodic memory for: picking up where you left off, referencing past decisions, building continuity across sessions.

4. Semantic Memory

Analogy: A cheat sheet of known facts

Semantic memory stores distilled, stable facts about the world or the user — not the raw record of what was said, but the extracted truth. “User prefers Python over Java” is semantic memory. “In session #47, the user said they find Java verbose” is episodic memory.

Semantic memory is extracted from interactions and updated over time. It’s compact, always-available, and immediately useful.

class SemanticMemory:
    """Stores user preferences and facts as key-value pairs."""
    
    def __init__(self, storage_path: str):
        self.storage_path = storage_path
        self.facts = self._load()
    
    def _load(self) -> dict:
        try:
            with open(self.storage_path) as f:
                return json.load(f)
        except FileNotFoundError:
            return {}
    
    def update_fact(self, user_id: str, key: str, value: str):
        if user_id not in self.facts:
            self.facts[user_id] = {}
        self.facts[user_id][key] = value
        with open(self.storage_path, "w") as f:
            json.dump(self.facts, f, indent=2)
    
    def get_user_facts(self, user_id: str) -> dict:
        return self.facts.get(user_id, {})


semantic = SemanticMemory("./semantic.json")
semantic.update_fact("alex", "preferred_language", "Python")
semantic.update_fact("alex", "project_type", "Django e-commerce app")
semantic.update_fact("alex", "database", "PostgreSQL")
semantic.update_fact("alex", "deployment", "AWS ECS with Docker")
semantic.update_fact("alex", "deadline", "June 20, 2026")

# Later sessions inject this as context
user_facts = semantic.get_user_facts("alex")
facts_text = "\n".join([f"- {k}: {v}" for k, v in user_facts.items()])

Use semantic memory for: persistent user preferences, known constraints, project metadata.

Building a Personal Assistant Agent with Memory

Here’s a complete agent that combines all four memory types to give a persistent, context-aware experience:

import anthropic
import chromadb
from chromadb.utils import embedding_functions

client = anthropic.Anthropic()

class PersonalAssistantAgent:
    def __init__(self, user_id: str):
        self.user_id = user_id
        self.conversation_history = []  # in-context memory
        
        # External vector memory
        chroma = chromadb.PersistentClient(path="./agent_memory_db")
        self.vector_memory = chroma.get_or_create_collection(
            name="memories",
            embedding_function=embedding_functions.DefaultEmbeddingFunction()
        )
        
        # Semantic memory (simplified as dict)
        self.semantic = SemanticMemory("./semantic.json")
        self.episodic = EpisodicMemory("./episodes.json")
    
    def _build_system_prompt(self) -> str:
        # Load semantic facts
        facts = self.semantic.get_user_facts(self.user_id)
        facts_text = "\n".join([f"- {k}: {v}" for k, v in facts.items()]) if facts else "None yet."
        
        # Load recent episodes
        episodes = self.episodic.get_recent_episodes(self.user_id, n=3)
        episodes_text = "\n".join([
            f"- {e['timestamp'][:10]}: {e['summary']}"
            for e in episodes
        ]) if episodes else "No previous sessions."
        
        return f"""You are a personal assistant for user {self.user_id}.

Known facts about this user:
{facts_text}

Recent sessions:
{episodes_text}

Use this context to give personalized, consistent help. Reference past work when relevant.
Update your mental model of the user as you learn new things about them."""
    
    def _retrieve_relevant_memories(self, query: str) -> str:
        """Search vector memory for relevant past content."""
        try:
            results = self.vector_memory.query(
                query_texts=[query],
                n_results=3,
                where={"user_id": self.user_id}
            )
            if results["documents"][0]:
                return "Relevant memories:\n" + "\n".join(results["documents"][0])
        except Exception:
            pass
        return ""
    
    def store_memory(self, content: str, memory_type: str = "general"):
        """Store something in vector memory for later retrieval."""
        import uuid
        self.vector_memory.add(
            documents=[content],
            ids=[str(uuid.uuid4())],
            metadatas=[{"user_id": self.user_id, "type": memory_type}]
        )
    
    def chat(self, user_message: str) -> str:
        # Retrieve relevant vector memories
        relevant_memories = self._retrieve_relevant_memories(user_message)
        
        # Add user message (optionally prepend retrieved memories)
        if relevant_memories:
            augmented_message = f"{relevant_memories}\n\nUser: {user_message}"
        else:
            augmented_message = user_message
        
        self.conversation_history.append({
            "role": "user",
            "content": augmented_message
        })
        
        response = client.messages.create(
            model="claude-opus-4-5",
            max_tokens=1024,
            system=self._build_system_prompt(),
            messages=self.conversation_history
        )
        
        assistant_message = response.content[0].text
        self.conversation_history.append({
            "role": "assistant",
            "content": assistant_message
        })
        
        return assistant_message
    
    def end_session(self, session_summary: str, decisions: list):
        """Called when a session ends — store episodic memory."""
        self.episodic.save_episode(self.user_id, session_summary, decisions)
        # Also store in vector memory for semantic search
        self.store_memory(session_summary, memory_type="episode_summary")


# Usage
agent = PersonalAssistantAgent(user_id="alex")

# First interaction in a new session
print(agent.chat("What were we working on last time?"))
print(agent.chat("Right — let me continue with the Product model. What fields should it have?"))

# End of session
agent.end_session(
    session_summary="Completed Product model with UUID pk, name, price, stock_quantity, images (S3 URLs)",
    decisions=["Used DecimalField for price", "Added slug field for SEO-friendly URLs"]
)

Choosing the Right Memory Type

Scenario	Memory Type
Remember what was said 5 messages ago	In-context
Remember user’s language preference across sessions	Semantic
Remember what was discussed last Tuesday	Episodic
Find notes about a specific topic from months ago	External/Vector
Store the entire project codebase for reference	External/Vector

Most production agents use a combination. A common pattern:

Always: in-context for the current conversation
Always: semantic for known user facts injected into the system prompt
On session start: load last 3 episodic summaries into system prompt
On query: vector search for relevant past content to prepend to the message

Memory Pitfalls to Avoid

Over-stuffing context: Loading every memory into every request wastes tokens and buries the relevant in the irrelevant. Use selective retrieval — only load what’s relevant to the current query.

Stale semantic facts: If a user changes preferences, semantic memory needs to be updated. Build in an update mechanism, not just a write mechanism.

No memory consolidation: After 100 sessions, episodic memory has 100 entries. Summarize and compress old episodes periodically to keep the relevant history manageable.

Trusting memory blindly: Memory can be wrong or outdated. Build agents that confirm before acting on potentially stale information: “Last time you mentioned deploying to AWS ECS — is that still the plan?”

Summary

In-context memory is fast but limited to the current session — like RAM
External vector memory is persistent and searchable — like a filing cabinet
Episodic memory stores narrative records of past sessions — like a diary
Semantic memory stores distilled stable facts — like a cheat sheet
The personal assistant agent combines all four for a truly persistent experience
Always retrieve selectively — load only what’s relevant to the current query

Next: Building Your First Agent with LangChain — putting tools, memory, and the ReAct pattern together in a real web research agent.

Course Content

Why Memory Matters

The Four Memory Types

1. In-Context Memory (Working Memory)

2. External Memory (Vector Memory)

3. Episodic Memory

4. Semantic Memory

Building a Personal Assistant Agent with Memory

Choosing the Right Memory Type

Memory Pitfalls to Avoid

Summary

🍪 Cookie Notice

Cookie Preferences

Essential Cookies

Analytics Cookies

Marketing Cookies

Functionality Cookies