Course Content
What Are AI Agents?
Beyond chatbots: how agents plan, act, observe, and iterate to complete tasks
From Chatbots to Agents: What Changed?
Ask a chatbot “What’s the weather in Paris?” and it answers. Ask it to “Book me a flight to Paris for next Friday, find a hotel near the Eiffel Tower, and email me the itinerary” — and a chatbot will politely tell you it cannot do that.
An AI agent will just do it.
The difference is not intelligence — it’s agency. A chatbot responds to a single message and stops. An agent receives a goal, breaks it into steps, executes actions in the real world, observes what happened, and keeps going until the goal is complete. It acts rather than answers.
This distinction sounds simple but it changes everything about how you design, build, and deploy AI systems.
The Observe-Think-Act Loop
Every AI agent — whether it’s booking flights or triaging software bugs — operates on the same core loop:
1. OBSERVE — receive input (user message, tool result, environment state)
2. THINK — reason about what to do next
3. ACT — call a tool, send a message, write a file, or produce output
4. REPEAT — use the result of the action as new input, loop againThis loop runs until the task is complete, the agent hits a stopping condition, or it asks for human input.
Think of it like a chef following a recipe they’ve never seen before. They read the next step (observe), decide what to do (think), perform the action (act), look at what they just produced (observe again), and move to the next step. They don’t need someone watching over their shoulder for every move.
A Real Scenario: The Travel Booking Agent
Let’s make this concrete. A user sends the message:
“Book me a round-trip flight from New York to Tokyo in October, find a 4-star hotel near Shinjuku for 7 nights, and email me the details.”
Here is what an agent actually does, turn by turn:
Turn 1 — Search for flights
- Observe: User wants flights NYC to Tokyo in October
- Think: I need flight options first. I’ll search for round-trips.
- Act:
search_flights(origin="JFK", destination="NRT", month="October") - Result: Returns 5 flight options with prices
Turn 2 — Filter and select
- Observe: Got flight options. Prices range from $850 to $1,400.
- Think: I should pick the best balance of price and duration. The $950 option on Oct 12 is 14 hours with one stop — reasonable.
- Act:
select_flight(flight_id="NH007", date="Oct-12") - Result: Flight held, booking reference #NH-29481
Turn 3 — Search for hotels
- Observe: Flight booked. Now need hotels near Shinjuku, Oct 12–19.
- Think: Check hotel availability for those dates.
- Act:
search_hotels(location="Shinjuku, Tokyo", checkin="Oct-12", checkout="Oct-19", stars=4) - Result: Returns 8 hotels with availability and pricing
Turn 4 — Book the hotel
- Observe: Park Hyatt Shinjuku is $180/night, 4.5 stars, excellent reviews.
- Think: This matches the user’s criteria well.
- Act:
book_hotel(hotel_id="park-hyatt-shinjuku", checkin="Oct-12", checkout="Oct-19") - Result: Booking confirmed, reference #PH-8821
Turn 5 — Send confirmation email
- Observe: Both flight and hotel are booked. Task nearly complete.
- Think: Compile details and send email as requested.
- Act:
send_email(to="user@example.com", subject="Your Tokyo Trip Itinerary", body="...") - Result: Email sent successfully
Turn 6 — Report back to user
- Think: All tasks complete. Summarize for the user.
- Act: Return final message: “Done! I’ve booked your flight (NH007, Oct 12, $950) and Park Hyatt Shinjuku (Oct 12–19, $1,260 total). Confirmation details sent to your email.”
Notice what happened: the agent ran six distinct actions, used four different tools, and made decisions at each step — all from a single user message.
The Three Pillars of an Agent
Every agent is built on three foundational capabilities:
1. Tools
Tools are functions the agent can call to interact with the world. Without tools, an LLM can only produce text. With tools, it can take actions.
Common tools include:
- Search tools — query the web, a database, or a knowledge base
- Action tools — send emails, create calendar events, make API calls
- Compute tools — run code, perform calculations, process files
- Storage tools — read and write files, update records
A tool is just a function with a clear name, description, and parameter schema. The LLM reads the description and decides when to call it.
def search_flights(origin: str, destination: str, month: str) -> list[dict]:
"""Search for available round-trip flights.
Args:
origin: Airport code (e.g. 'JFK')
destination: Airport code (e.g. 'NRT')
month: Travel month (e.g. 'October')
Returns:
List of flight options with price and duration
"""
# Call actual flight API here
return flight_api.search(origin, destination, month)2. Memory
An agent’s memory determines what it knows and can recall. There are several types (covered in depth in the Memory lesson), but the essentials are:
- Working memory (context window): What the agent currently holds in its “mind” — the conversation so far, tool results, intermediate reasoning.
- External memory: A database or vector store the agent can search for information beyond what fits in context.
Think of working memory as a whiteboard — fast and immediately visible, but limited in space. External memory is a filing cabinet — unlimited capacity but requires effort to search.
3. Planning
Planning is the agent’s ability to decompose a complex goal into a sequence of steps. When a user says “write me a market analysis report on Tesla,” a planning agent doesn’t immediately start writing. It thinks:
- What sections does a market analysis need?
- What data do I need for each section?
- In what order should I gather and write things?
- How do I verify I’ve covered everything?
Agents with explicit planning tend to perform better on multi-step tasks because they don’t get lost in the middle of a complex sequence.
Chatbot vs. Agent: A Direct Comparison
| Property | Chatbot | Agent |
|---|---|---|
| Input | Single message | Goal or task |
| Output | Single response | Series of actions + final result |
| Tools | None (or read-only) | Can take real-world actions |
| Memory | Usually stateless | Can maintain state across turns |
| Loops | One response and done | Runs until task complete |
| Decision-making | Reactive | Proactive and goal-driven |
A chatbot is like a reference librarian: ask a question, get an answer. An agent is like a personal assistant: give them a task, they figure it out and get it done.
What Makes Something “Agentic”?
Not every LLM application is an agent. Here’s the key test: Does it take actions in the world, or does it only return text?
- An LLM that writes a cover letter — not an agent
- An LLM that writes, formats, and emails the cover letter — agent
- An LLM that answers “what’s 2+2” — not an agent
- An LLM that runs Python code to compute a result and returns it — agent
The word “agentic” describes any system where the model’s output triggers real-world consequences — an API call is made, a file is written, an email is sent, a database is updated. The model isn’t just generating tokens into a void; it’s doing things.
There’s also a spectrum. Some applications are slightly agentic (one tool call) while others are fully agentic (dozens of tool calls, multiple decision points, parallel sub-tasks). As you build more complex systems, you’ll move along this spectrum.
Why This Matters Now
The shift to agentic AI is happening fast for a concrete reason: modern LLMs are good enough at reasoning that the “think” step in the observe-think-act loop actually works. Earlier models would hallucinate tool parameters, lose track of goals, or make nonsensical decisions. Current frontier models can follow complex multi-step plans with reasonable reliability.
This doesn’t mean agents are perfect — they fail, get stuck, make wrong decisions. But the failure rate is low enough to be useful, and the capability gap between “answers questions” and “completes tasks” is enormous.
Understanding how agents work, how they fail, and how to design them well is one of the most valuable skills in AI engineering right now.
Summary
- Chatbots respond to messages; agents complete goals by taking actions
- Every agent operates on the observe-think-act loop
- Agents need three things: tools (to act), memory (to remember), and planning (to sequence actions)
- An application is “agentic” when the model’s outputs have real-world consequences
- The travel booking scenario showed a 6-turn agent loop completing a complex task from a single user instruction
In the next lesson, you’ll learn the ReAct pattern — the specific reasoning structure that makes agents reliable and debuggable.
