Course Content
Capstone: Research and Report Agent
Build an agent that searches the web, synthesizes findings, and produces a structured report
What You’ll Build
This capstone brings together everything from the Agentic AI course: multi-agent orchestration, tool use, memory, safety, and evaluation — into one coherent project.
The Research and Report Agent takes a research question as input and produces a structured markdown report with source citations. It’s the kind of tool a researcher, analyst, or technical writer would actually use.
Input: “What are the trade-offs between RAG and fine-tuning for enterprise LLM applications?”
Output: A structured report with executive summary, detailed analysis sections, trade-off table, recommendation, and cited sources — produced in under 2 minutes.
System Architecture
The system uses four specialized agents in sequence:
User Question
|
v
[Planner Agent]
Breaks the question into 4-6 specific sub-questions
|
v
[Research Agent]
Searches the web for each sub-question (can run in parallel)
|
v
[Synthesis Agent]
Combines all research findings into coherent prose
|
v
[Formatter Agent]
Structures the content into a polished markdown report
|
v
Final Report (with sources, sections, and executive summary)Each agent has a single responsibility. This makes the system easier to debug, improve, and extend than a single monolithic agent.
Part 1: The Planner Agent
The Planner’s job is to decompose a broad research question into specific, searchable sub-questions. This is critical because “What are the trade-offs between RAG and fine-tuning” is too broad for a single search — you’ll get scattered results. Breaking it into sub-questions produces focused, high-quality research.
import anthropic
import json
client = anthropic.Anthropic()
PLANNER_SYSTEM = """You are a research planning specialist. Your job is to decompose
a broad research question into 4-6 specific sub-questions that together cover the topic completely.
Each sub-question should:
- Be specific enough to search for directly
- Cover a distinct aspect of the main question
- Together provide comprehensive coverage of the topic
Return a JSON object with:
{
"main_question": "the original question",
"sub_questions": ["question 1", "question 2", ...],
"key_concepts": ["concept 1", "concept 2", ...],
"report_sections": ["suggested section titles for the final report"]
}"""
def run_planner(research_question: str) -> dict:
"""Break a research question into a research plan."""
response = client.messages.create(
model="claude-opus-4-5",
max_tokens=1024,
system=PLANNER_SYSTEM,
messages=[{
"role": "user",
"content": f"Create a research plan for: {research_question}"
}]
)
try:
# Extract JSON from response
text = response.content[0].text
# Find JSON block
start = text.find('{')
end = text.rfind('}') + 1
return json.loads(text[start:end])
except (json.JSONDecodeError, ValueError):
# Fallback if JSON parsing fails
return {
"main_question": research_question,
"sub_questions": [research_question],
"key_concepts": [],
"report_sections": ["Overview", "Analysis", "Conclusion"]
}
# Test the planner
plan = run_planner("What are the trade-offs between RAG and fine-tuning for enterprise LLM applications?")
print(json.dumps(plan, indent=2))For our test question, the planner should produce sub-questions like:
- What is RAG (Retrieval Augmented Generation) and how does it work?
- What is LLM fine-tuning and when is it used?
- What are the cost and infrastructure requirements for RAG vs fine-tuning?
- How do RAG and fine-tuning compare on knowledge freshness and accuracy?
- What are the latency and scalability trade-offs?
- Which approach do enterprises typically choose and why?
Part 2: The Research Agent
The Research Agent takes a single sub-question and finds relevant, accurate information. For each sub-question, it performs 1-3 targeted searches and returns structured findings.
import requests
import time
# Simple web search using DuckDuckGo's instant answer API
def web_search(query: str, max_results: int = 3) -> list[dict]:
"""Search the web and return clean results."""
try:
# Using DuckDuckGo HTML scraping (for production, use Serper or Tavily API)
headers = {"User-Agent": "Mozilla/5.0 (research bot)"}
params = {"q": query, "format": "json"}
response = requests.get(
"https://api.duckduckgo.com/",
params=params,
headers=headers,
timeout=10
)
data = response.json()
results = []
# Abstract (main result)
if data.get("Abstract"):
results.append({
"source": data.get("AbstractURL", "DuckDuckGo"),
"title": data.get("Heading", query),
"content": data["Abstract"]
})
# Related topics
for topic in data.get("RelatedTopics", [])[:max_results - 1]:
if isinstance(topic, dict) and topic.get("Text"):
results.append({
"source": topic.get("FirstURL", ""),
"title": topic.get("Text", "")[:50],
"content": topic.get("Text", "")
})
return results if results else [{"source": "", "title": query, "content": "No results found"}]
except Exception as e:
return [{"source": "", "title": "Search failed", "content": str(e)}]
RESEARCH_TOOLS = [
{
"name": "search_web",
"description": """Search the web for information about a specific topic.
Use for current information, technical comparisons, and expert opinions.
Returns: list of results with source URL, title, and content snippet.""",
"input_schema": {
"type": "object",
"properties": {
"query": {"type": "string", "description": "Specific search query"}
},
"required": ["query"]
}
}
]
RESEARCH_SYSTEM = """You are a research analyst. Given a specific research question,
search for accurate and relevant information.
Your output should be structured research notes containing:
- Key facts and data points (with sources)
- Expert opinions or consensus views
- Any important nuances or contradictions found
- 2-4 credible sources
Be factual and specific. Avoid vague generalities."""
def run_research_agent(sub_question: str) -> dict:
"""Research a single sub-question. Returns findings dict."""
messages = [{"role": "user", "content": f"Research this question: {sub_question}"}]
sources = []
for _ in range(5):
response = client.messages.create(
model="claude-opus-4-5",
max_tokens=1024,
system=RESEARCH_SYSTEM,
tools=RESEARCH_TOOLS,
messages=messages
)
if response.stop_reason == "end_turn":
return {
"question": sub_question,
"findings": response.content[0].text,
"sources": sources
}
if response.stop_reason == "tool_use":
messages.append({"role": "assistant", "content": response.content})
tool_results = []
for block in response.content:
if block.type == "tool_use":
results = web_search(block.input["query"])
# Collect sources for citation
sources.extend([r["source"] for r in results if r.get("source")])
formatted = "\n\n".join([
f"Source: {r['source']}\n{r['content']}"
for r in results
])
tool_results.append({
"type": "tool_result",
"tool_use_id": block.id,
"content": formatted
})
messages.append({"role": "user", "content": tool_results})
return {"question": sub_question, "findings": "Research incomplete", "sources": sources}Part 3: Running Research in Parallel
For the six sub-questions our planner generates, we can run research in parallel to save time:
import concurrent.futures
from typing import List
def research_all_questions(sub_questions: List[str]) -> List[dict]:
"""Run research agents for all sub-questions in parallel."""
print(f"Researching {len(sub_questions)} sub-questions in parallel...")
with concurrent.futures.ThreadPoolExecutor(max_workers=3) as executor:
# Submit all research tasks
future_to_question = {
executor.submit(run_research_agent, q): q
for q in sub_questions
}
results = []
for future in concurrent.futures.as_completed(future_to_question):
question = future_to_question[future]
try:
result = future.result(timeout=60)
results.append(result)
print(f" Completed: {question[:60]}...")
except Exception as e:
print(f" Failed: {question[:60]}... ({e})")
results.append({
"question": question,
"findings": f"Research failed: {str(e)}",
"sources": []
})
return resultsUsing 3 parallel workers balances speed against rate limits. With 6 sub-questions, this reduces total research time from ~90 seconds (sequential) to ~30 seconds (parallel).
Part 4: The Synthesis Agent
The Synthesis Agent takes all research findings and weaves them into coherent prose. Its challenge is avoiding repetition and making the writing flow naturally.
SYNTHESIS_SYSTEM = """You are an expert technical writer and analyst. You receive research
findings from multiple sources and synthesize them into clear, coherent prose.
Guidelines:
- Integrate findings across sub-questions into flowing paragraphs, not bullet dumps
- Identify and highlight areas of consensus and disagreement
- Use specific facts and data points from the research
- Write for a senior technical audience — no hand-holding, no padding
- Maintain a neutral, analytical tone
- Flag anywhere the research found contradictions or uncertainty"""
def run_synthesis_agent(plan: dict, research_results: List[dict]) -> str:
"""Synthesize all research findings into coherent analysis."""
# Format research for the synthesis agent
research_text = ""
all_sources = []
for result in research_results:
research_text += f"\n\n### Sub-question: {result['question']}\n"
research_text += result['findings']
all_sources.extend(result.get('sources', []))
# Deduplicate sources
unique_sources = list(dict.fromkeys(s for s in all_sources if s))
prompt = f"""Main research question: {plan['main_question']}
Suggested report sections: {', '.join(plan.get('report_sections', []))}
Research findings:
{research_text}
Write a comprehensive synthesis that covers all aspects of the research.
This synthesis will be used as the basis for the final report."""
response = client.messages.create(
model="claude-opus-4-5",
max_tokens=3000,
system=SYNTHESIS_SYSTEM,
messages=[{"role": "user", "content": prompt}]
)
return response.content[0].text, unique_sourcesPart 5: The Formatter Agent
The Formatter Agent takes the synthesis and produces a polished markdown report with proper structure, an executive summary, and formatted citations.
FORMATTER_SYSTEM = """You are a report formatter. Convert research synthesis into a
polished, professional markdown report.
Required report structure:
# [Title]
## Executive Summary
(2-3 paragraphs, standalone summary of key findings and recommendation)
## [Section 1 Title]
(prose content)
## [Section 2 Title]
(prose content)
... (additional sections)
## Recommendation
(clear, actionable recommendation based on the research)
## Sources
(formatted citation list)
---
*Report generated by Research Agent | [Date]*
Keep all technical content from the synthesis. The formatting should make the
content more readable, not change its substance."""
def run_formatter_agent(synthesis: str, sources: List[str], research_question: str) -> str:
"""Format synthesis into a polished markdown report."""
# Format sources as a numbered list
source_list = "\n".join([
f"{i+1}. {source}"
for i, source in enumerate(sources[:10]) # Limit to 10 sources
]) if sources else "Web research (sources available on request)"
prompt = f"""Research question: {research_question}
Synthesis:
{synthesis}
Available sources:
{source_list}
Format this into a polished markdown report following the required structure."""
response = client.messages.create(
model="claude-opus-4-5",
max_tokens=4096,
system=FORMATTER_SYSTEM,
messages=[{"role": "user", "content": prompt}]
)
return response.content[0].textPart 6: The Orchestrator
Now wire everything together:
from datetime import datetime
def run_research_report_agent(research_question: str) -> dict:
"""
Full pipeline: Plan → Research → Synthesize → Format → Report
Returns the final report and metadata.
"""
start_time = datetime.now()
print(f"\n{'='*70}")
print(f"Research Report Agent")
print(f"Question: {research_question}")
print(f"{'='*70}\n")
# Step 1: Plan
print("[1/4] Planning research structure...")
plan = run_planner(research_question)
print(f"Generated {len(plan['sub_questions'])} sub-questions")
for i, q in enumerate(plan['sub_questions'], 1):
print(f" {i}. {q}")
# Step 2: Research (parallel)
print(f"\n[2/4] Researching {len(plan['sub_questions'])} sub-questions...")
research_results = research_all_questions(plan['sub_questions'])
successful = sum(1 for r in research_results if "failed" not in r['findings'].lower())
print(f"Research complete: {successful}/{len(research_results)} successful")
# Step 3: Synthesize
print("\n[3/4] Synthesizing findings...")
synthesis, sources = run_synthesis_agent(plan, research_results)
print(f"Synthesis complete ({len(synthesis)} chars)")
# Step 4: Format
print("\n[4/4] Formatting final report...")
report = run_formatter_agent(synthesis, sources, research_question)
elapsed = (datetime.now() - start_time).total_seconds()
print(f"\nReport complete in {elapsed:.1f} seconds")
print(f"Report length: {len(report)} characters")
return {
"question": research_question,
"report": report,
"plan": plan,
"sources": sources,
"elapsed_seconds": elapsed,
"timestamp": start_time.isoformat()
}Part 7: Running the Capstone
if __name__ == "__main__":
result = run_research_report_agent(
"What are the trade-offs between RAG and fine-tuning for enterprise LLM applications?"
)
# Save the report
with open("rag_vs_finetune_report.md", "w") as f:
f.write(result["report"])
print("\n" + "="*70)
print("FINAL REPORT")
print("="*70)
print(result["report"])Expected Output Structure
Running this on our test question produces a report with this structure:
# RAG vs Fine-Tuning for Enterprise LLM Applications: A Comparative Analysis
## Executive Summary
Enterprise organizations adopting LLMs face a fundamental architectural choice...
[2-3 paragraphs covering the key trade-offs and bottom-line recommendation]
## Understanding RAG and Fine-Tuning
RAG (Retrieval-Augmented Generation) augments LLM responses by retrieving...
Fine-tuning modifies the model's weights through additional training on...
## Cost and Infrastructure Requirements
RAG typically requires a vector database (Pinecone, Weaviate, ChromaDB)...
Fine-tuning costs are front-loaded: a full fine-tune of a 7B parameter model...
## Knowledge Freshness and Accuracy
RAG's key advantage is knowledge currency — the retrieval index can be updated...
Fine-tuned models bake knowledge into weights, creating a staleness problem...
## Latency and Scalability
RAG adds retrieval latency (50-200ms typical) to each inference call...
Fine-tuned models have no retrieval overhead but require serving a custom model...
## Enterprise Adoption Patterns
Based on available data, enterprises with dynamic knowledge bases prefer RAG...
## Recommendation
For most enterprise use cases, start with RAG. It has lower upfront cost...
Consider fine-tuning when: you need consistent output format/style, the task...
## Sources
1. https://arxiv.org/abs/2312.10997
2. https://huggingface.co/blog/rag-vs-fine-tuning
...
---
*Report generated by Research Agent | 2026-06-06*Extending the System
Once you have the basic pipeline working, these extensions add significant value:
Add citation verification: Before including a source, fetch and verify the URL is live:
def verify_source(url: str) -> bool:
try:
response = requests.head(url, timeout=5)
return response.status_code == 200
except:
return FalseAdd a critique step: Before formatting, run a “critic agent” that identifies gaps or weak claims:
CRITIC_SYSTEM = """Review this research synthesis. Identify: (1) claims that lack evidence,
(2) important aspects of the topic that weren't covered, (3) any logical inconsistencies."""Add export formats: The formatter can produce PDF, HTML, or Notion formats in addition to markdown.
Add evaluation: Use the evaluation framework from Lesson 6 to score each report automatically.
Summary of What You Built
This capstone project assembled every concept from the course:
- Multi-agent architecture (Planner, Research, Synthesis, Formatter)
- Tool use (web search wrapped as an agent tool)
- Parallel execution (researching sub-questions concurrently)
- Real API integration (DuckDuckGo search with error handling)
- Safety (source limits, output size limits, timeout handling)
- Structured output (the formatter produces consistent markdown)
The result is a system that turns a research question into a polished, cited report in about 60-90 seconds — work that would take a human researcher 2-4 hours.
The architecture scales naturally: add more research tools (arXiv, news APIs, internal databases), improve any individual agent’s prompt without touching the others, add parallelism at the synthesis step for very long reports, or replace any component with a specialized model fine-tuned for that task.
This is how production agentic systems are built: start with a clear pipeline, implement each stage as a focused agent, test the pipeline end-to-end, then improve components incrementally based on where quality is weakest.
