Multi-Agent Systems: Introduction

Why One Agent Isn’t Always Enough

A single agent with access to every tool sounds like the most powerful design. In practice, it’s often the worst one.

Imagine a single agent responsible for: researching a topic, writing a draft, editing for tone, checking facts, optimizing for SEO, formatting for publication, and posting to your blog. This agent needs 10+ tools, a system prompt covering every role, and must context-switch between radically different tasks in a single conversation.

The problems compound quickly:

Context window pressure: Long research outputs eat into the space available for writing
Role confusion: A system prompt that says “you are both a researcher and an editor” leads to blurry, mediocre output for both
No parallelism: Tasks that could run simultaneously are forced to run sequentially
Hard to debug: When something fails, which “hat” was the agent wearing?

Multi-agent systems solve these problems by splitting work across specialized agents that each do one thing well.

The Content Production Pipeline

Let’s use a concrete scenario: a content production system for a tech blog.

Input: A topic (“The rise of mixture-of-experts models”) Output: A published blog post with headline, body, metadata, and source citations

The pipeline has five agents:

Orchestrator
    |
    ├── Research Agent   (searches web, finds sources, extracts key facts)
    |
    ├── Writer Agent     (drafts the blog post from research notes)
    |
    ├── Editor Agent     (refines tone, fixes structure, cuts fluff)
    |
    └── Publisher Agent  (formats as markdown, adds metadata, posts to CMS)

Each agent has a focused system prompt, a minimal set of tools for its role, and a clear input/output contract. The Orchestrator coordinates the flow and handles failures.

The Orchestrator-Worker Pattern

The orchestrator is an LLM that manages other agents (the workers). It doesn’t do the work itself — it directs, monitors, and synthesizes.

import anthropic
from dataclasses import dataclass
from typing import Optional

client = anthropic.Anthropic()

@dataclass
class AgentResult:
    agent_name: str
    success: bool
    output: str
    error: Optional[str] = None


def run_worker_agent(
    agent_name: str,
    system_prompt: str,
    task: str,
    tools: list = None
) -> AgentResult:
    """Run a single worker agent on a specific task."""
    
    try:
        kwargs = {
            "model": "claude-opus-4-5",
            "max_tokens": 2048,
            "system": system_prompt,
            "messages": [{"role": "user", "content": task}]
        }
        if tools:
            kwargs["tools"] = tools
        
        response = client.messages.create(**kwargs)
        
        return AgentResult(
            agent_name=agent_name,
            success=True,
            output=response.content[0].text
        )
    except Exception as e:
        return AgentResult(
            agent_name=agent_name,
            success=False,
            output="",
            error=str(e)
        )

Implementing Each Worker Agent

# Agent system prompts — focused, single-purpose

RESEARCH_SYSTEM = """You are a research specialist. Your job is to find accurate, 
relevant information about a topic. Focus on:
- Key technical concepts and definitions
- Recent developments and trends  
- Credible sources with specific facts and data
- Common misconceptions to address

Output: structured research notes with clear headings and source attributions.
Be thorough but focus on what's most relevant for a technical blog post."""

WRITER_SYSTEM = """You are a technical content writer for a developer audience.
Given research notes, write an engaging blog post that:
- Opens with a hook that establishes why this matters
- Explains technical concepts clearly without oversimplifying
- Uses concrete examples and analogies
- Flows naturally from introduction to conclusion

Target: 800-1200 words. Write for senior engineers who are busy and skeptical.
Do not pad. Do not repeat yourself."""

EDITOR_SYSTEM = """You are a senior technical editor. Your job is to improve a draft without 
changing its substance. Focus on:
- Cutting sentences that don't add information
- Strengthening the opening paragraph
- Ensuring the conclusion delivers a clear takeaway
- Fixing any awkward phrasing or unclear explanations
- Ensuring consistent technical terminology

Return the improved draft. Note any significant changes you made at the end."""

PUBLISHER_SYSTEM = """You are responsible for formatting content for publication.
Given a final draft, produce:
1. A compelling title (under 70 characters)
2. A meta description (150-160 characters)
3. 3-5 relevant tags
4. The post body formatted as clean markdown
5. A "published at" timestamp placeholder

Output valid JSON with keys: title, meta_description, tags, body, published_at."""

The Orchestrator Logic

def run_content_pipeline(topic: str) -> dict:
    """
    Orchestrate the full content production pipeline.
    Returns the final published post or an error report.
    """
    
    print(f"\n{'='*60}")
    print(f"Content Pipeline: {topic}")
    print('='*60)
    
    # Step 1: Research
    print("\n[1/4] Research Agent working...")
    research_result = run_worker_agent(
        agent_name="Research Agent",
        system_prompt=RESEARCH_SYSTEM,
        task=f"Research this topic for a technical blog post: {topic}"
    )
    
    if not research_result.success:
        return {"error": f"Research failed: {research_result.error}"}
    
    print(f"Research complete ({len(research_result.output)} chars)")
    
    # Step 2: Write (uses research output)
    print("\n[2/4] Writer Agent working...")
    write_result = run_worker_agent(
        agent_name="Writer Agent",
        system_prompt=WRITER_SYSTEM,
        task=f"""Write a blog post about: {topic}

Based on these research notes:
{research_result.output}"""
    )
    
    if not write_result.success:
        return {"error": f"Writing failed: {write_result.error}"}
    
    print(f"Draft complete ({len(write_result.output)} chars)")
    
    # Step 3: Edit (uses draft)
    print("\n[3/4] Editor Agent working...")
    edit_result = run_worker_agent(
        agent_name="Editor Agent",
        system_prompt=EDITOR_SYSTEM,
        task=f"Edit and improve this draft:\n\n{write_result.output}"
    )
    
    if not edit_result.success:
        # Non-critical failure — fall back to unedited draft
        print(f"Edit failed: {edit_result.error}. Using unedited draft.")
        final_draft = write_result.output
    else:
        final_draft = edit_result.output
        print(f"Edit complete ({len(final_draft)} chars)")
    
    # Step 4: Publish (format for CMS)
    print("\n[4/4] Publisher Agent working...")
    publish_result = run_worker_agent(
        agent_name="Publisher Agent",
        system_prompt=PUBLISHER_SYSTEM,
        task=f"Format this post for publication:\n\n{final_draft}"
    )
    
    if not publish_result.success:
        return {"error": f"Publishing failed: {publish_result.error}", "draft": final_draft}
    
    # Parse the JSON output from the publisher
    import json
    try:
        published_post = json.loads(publish_result.output)
        print(f"\nPipeline complete!")
        print(f"Title: {published_post.get('title', 'N/A')}")
        print(f"Tags: {published_post.get('tags', [])}")
        return published_post
    except json.JSONDecodeError:
        return {"error": "Publisher output was not valid JSON", "raw": publish_result.output}


# Run the pipeline
result = run_content_pipeline("The rise of mixture-of-experts models in LLMs")

How Agents Communicate

In this pipeline, agents communicate through shared state — each agent’s output becomes the next agent’s input, passed as text in the message. This is the simplest pattern and works well for linear pipelines.

For more complex systems, there are two other patterns:

Message passing: Agents put messages into a queue; other agents pick them up. Good for asynchronous or parallel workflows.

from queue import Queue
import threading

research_queue = Queue()
writing_queue = Queue()

def research_worker(topic: str):
    result = run_worker_agent("Research Agent", RESEARCH_SYSTEM, topic)
    research_queue.put(result)

def writer_worker():
    research = research_queue.get()  # Blocks until research is ready
    result = run_worker_agent("Writer Agent", WRITER_SYSTEM, 
                               f"Write post using: {research.output}")
    writing_queue.put(result)

# Run in parallel where possible
# Research and metadata gathering can happen simultaneously
threads = [
    threading.Thread(target=research_worker, args=(topic,)),
    threading.Thread(target=writer_worker)
]

Shared database: All agents read from and write to a central store. The orchestrator coordinates access. Good for long-running pipelines where agents may run at different times.

When NOT to Use Multi-Agent Systems

Multi-agent is not the answer to every problem. It adds complexity — more API calls, more things that can fail, harder debugging. Use it only when the benefits are clear.

Use multi-agent when:

Tasks are naturally separable with clear input/output boundaries
Different tasks require very different “personas” or tool sets
Tasks can run in parallel and time matters
The context window of a single agent would overflow

Don’t use multi-agent when:

A single agent can handle the task cleanly in one pass
The tasks are tightly coupled (each step depends on real-time feedback from the previous)
You’re just starting out and want to understand the problem space first
The overhead of coordination exceeds the benefit of specialization

The content pipeline above is a good candidate because research, writing, editing, and publishing are genuinely different skills with clear handoffs. A “summarize this document” task is not — it doesn’t benefit from splitting across agents.

Handling Failures in Multi-Agent Systems

Multi-agent systems have more failure points than single-agent systems. Build resilience in from the start:

def run_with_retry(agent_name: str, system: str, task: str, max_retries: int = 2):
    """Run an agent with retry logic."""
    for attempt in range(max_retries + 1):
        result = run_worker_agent(agent_name, system, task)
        if result.success:
            return result
        
        if attempt < max_retries:
            print(f"{agent_name} failed (attempt {attempt+1}). Retrying...")
    
    return result  # Return the failed result after exhausting retries

Also think about graceful degradation: if the editor agent fails, use the unedited draft rather than failing the whole pipeline. If the publisher agent can’t produce valid JSON, return the markdown text directly. The goal is to always produce something useful, even when individual agents fail.

Summary

Multi-agent systems split work across specialized agents, each with a focused role
The orchestrator-worker pattern gives one agent (the orchestrator) responsibility for coordination
The content pipeline (Research → Write → Edit → Publish) is a concrete 4-agent example
Agents communicate through shared state (simplest), message queues (async), or shared databases (persistent)
Don’t use multi-agent for tasks a single agent handles well — it adds complexity without benefit
Build in retry logic and graceful degradation since multi-agent systems have more failure points

Next: Safety and Guardrails — because agents that take real-world actions can do real-world harm.

Course Content

Why One Agent Isn’t Always Enough

The Content Production Pipeline

The Orchestrator-Worker Pattern

Implementing Each Worker Agent

The Orchestrator Logic

How Agents Communicate

When NOT to Use Multi-Agent Systems

Handling Failures in Multi-Agent Systems

Summary

🍪 Cookie Notice

Cookie Preferences

Essential Cookies

Analytics Cookies

Marketing Cookies

Functionality Cookies