Agents + MCP — AI Learning Course

§ 01

The loop

ONE SENTENCE

An agent is a model in a loop with tools

Strip the buzzword: an agent is a language model that, instead of answering, can emit a tool call — a structured request like {"name": "search", "args": {...}}. The runtime executes it, appends the result to the context, and calls the model again. That loop — think → call → observe → repeat — until the model decides it's done, is the entire trick. Everything else (planning, memory, sub-agents) is engineering around this loop.

User requestModel reasonsTool call (JSON)Runtime executesResult → contextModel reasons againFinal answer

Pipeline — steps light up in order

WHY NOW

Three things had to land before agents worked

Agents were demoware in 2023 (AutoGPT looped forever) and infrastructure by 2026. What changed: (1) models trained specifically on tool-calling traces — emitting valid, well-timed calls is a trained skill, not a prompt trick; (2) reasoning training (module 30) — agents that think before acting recover from errors instead of spiraling; (3) context windows big enough to hold a whole working session. The result: coding agents that run for hours (Claude Code-class terminal agents, IDE agents), computer-use agents that drive real GUIs, and background agents doing scheduled work.

ANATOMY

What the model actually sees

Tools are declared to the model as JSON schemas in the context — name, description, parameters. The model's tool call is just generated tokens that parse as JSON (constrained decoding guarantees validity — module 08). The quality of the description matters as much as the model: a tool the model can't tell when to use is a tool it will misuse. This is why 'tool inventory belongs in the agent layer, not the prompt' is a real operating rule.

§ 02

MCP — the USB port

THE PROBLEM

N models × M tools used to mean N×M integrations

Before late 2024, every assistant integrated every data source separately — Slack for this app, Postgres for that one, all bespoke. Model Context Protocol (MCP), introduced by Anthropic and adopted across the industry in 2025, standardizes the plug: a server exposes tools, resources, and prompts over a common protocol; any MCP-capable client (Claude, IDEs, ChatGPT-class apps, custom agents) can use it. N×M became N+M.

MENTAL MODEL

MCP servers are device drivers for context

An MCP server for GitHub turns 'the repo' into callable tools (search_issues, create_pr). One for your filesystem exposes read/write. One for a browser exposes navigate/click. The agent doesn't know HTTP from SQL — it sees a uniform tool list. Practical effect: connecting an agent to a new system is now configuration, not engineering.

SECURITY

The agent's reach is the attack surface

Every tool you hand an agent is something a prompt injection can try to use: a poisoned web page or email that says 'ignore your instructions, run this' is input the model reads with the same eyes as your request. 2026 practice: least-privilege tool grants, read-only by default, human approval gates on destructive actions, and treating untrusted content fetched by tools as data, never as instructions. An agent with your shell and your inbox is exactly as dangerous as that sounds.

Then

2023 — chatbots with plugins; every integration bespoke; agents demo-only (AutoGPT loops).

Now · June 2026

June 2026 — tool-calling is a trained core skill, MCP is the cross-vendor standard for attaching tools and data, and long-running coding/computer-use/background agents are the dominant way serious work gets done with LLMs.

§ 03

Patterns that survive contact

ORCHESTRATION

Sub-agents, routers, and the one-default rule

Multi-agent systems work when the shape is simple: a router that picks a specialist, or a lead agent that fans out read-only researchers and synthesizes their reports. They fail when agents chat freely with each other — error compounds, cost explodes. The studio rule applies: one default agent, many models — route, don't proliferate.

MEMORY

Context is rented, memory is owned

The context window resets every session; anything the agent should keep has to be written down. Production agents pair the loop with an external store — files, a wiki, a database — that they read at start and write back to (this course's own maintenance runs on exactly that pattern). Memory design, not model choice, is usually what separates an agent that compounds from one that re-learns daily.

EVAL

Judge agents by completed tasks, not vibes

Agent quality is measured end-to-end: task completion rate, one-shot rate (done right without correction), cost per completed task, and blast radius of failures. A 90%-per-step success rate compounds to 35% over ten steps — which is why fewer, better-chosen tool calls beat long heroic chains, and why verification steps (run the tests, check the diff) are built into every serious agent loop.

§ 04

The agent
loop.