Autonomous agents, tool use, memory systems, planning strategies, multi-agent orchestration, and production deployment patterns.
AI Agents are autonomous systems that use LLMs to reason, plan, and execute actions using external tools. They go beyond simple chat by maintaining state and iterating until a goal is achieved.
import openai
import json
client = openai.OpenAI()
# ── Simple ReAct Agent Loop ──
def run_agent(task: str, max_iterations: int = 10):
messages = [
{"role": "system", "content": """You are a helpful agent.
Use the available tools to complete tasks.
Think step by step. After each tool use, analyze the result
and decide the next action. Respond with FINAL ANSWER when done."""},
{"role": "user", "content": task},
]
tools = [
{"type": "function", "function": {
"name": "search_web",
"description": "Search the web for information",
"parameters": {"type": "object", "properties": {
"query": {"type": "string"}},
"required": ["query"]}}},
{"type": "function", "function": {
"name": "calculate",
"description": "Evaluate a math expression",
"parameters": {"type": "object", "properties": {
"expression": {"type": "string"}},
"required": ["expression"]}}},
]
for i in range(max_iterations):
response = client.chat.completions.create(
model="gpt-4o", messages=messages, tools=tools, tool_choice="auto",
)
msg = response.choices[0].message
messages.append(msg)
if not msg.tool_calls:
return msg.content # Final answer
for tool_call in msg.tool_calls:
args = json.loads(tool_call.function.arguments)
if tool_call.function.name == "search_web":
result = search_web(args["query"])
elif tool_call.function.name == "calculate":
result = str(eval(args["expression"]))
messages.append({"role": "tool", "tool_call_id": tool_call.id,
"content": result})
result = run_agent("What is 15% of the population of Tokyo?")| Component | Purpose | Implementation |
|---|---|---|
| LLM (Brain) | Reason, plan, decide next action | GPT-4o, Claude, LLaMA |
| Tools (Hands) | Interact with external world | Search, Calculator, API calls, Code execution |
| Memory (Recall) | Store conversation and context | Short-term (buffer), Long-term (vector store) |
| Planning (Strategy) | Break tasks into steps | ReAct, Plan-and-Execute, Tree-of-Thought |
| State | Track progress and intermediate results | Messages list, scratchpad, JSON state |
| Guardrails | Safety constraints and validation | Input/output filters, permission checks |
Planning enables agents to break complex tasks into manageable steps, reason about dependencies, and handle failures gracefully.
# ── Plan-and-Execute Pattern ──
plan_prompt = """Given a task, create a step-by-step plan.
For each step, specify:
- description: what to do
- tool: which tool to use (or "think")
- dependencies: which steps must complete first
- expected_output: what the step produces
Task: {task}
Respond as JSON array of steps."""
def plan_and_execute(task: str):
# Phase 1: Plan
plan_response = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": plan_prompt.format(task=task)}],
response_format={"type": "json_object"},
)
plan = json.loads(plan_response.choices[0].message.content)
# Phase 2: Execute each step
results = {}
for step in plan["steps"]:
deps = step.get("dependencies", [])
# Wait for dependencies
for dep in deps:
if dep not in results:
raise Exception(f"Missing dependency: {dep}")
# Execute step
if step["tool"] == "think":
results[step["id"]] = "Analyzed"
elif step["tool"] == "search":
results[step["id"]] = search_web(step["description"])
else:
results[step["id"]] = execute_tool(step)
return results
# ── Tree-of-Thought (ToT) ──
tot_prompt = """Solve this problem by exploring multiple reasoning paths.
For each path:
1. Generate a thought/step
2. Evaluate its promise (high/medium/low)
3. Continue the most promising path
4. Backtrack if stuck
Problem: {problem}"""| Pattern | Approach | Strengths | Weaknesses |
|---|---|---|---|
| ReAct | Interleave think/act in one loop | Simple, flexible, widely used | No global planning, may loop |
| Plan-then-Execute | Generate full plan, then execute | Efficient, clear steps | Rigid, cannot adapt mid-plan |
| Tree-of-Thought | Explore multiple reasoning paths | Handles complex reasoning | Expensive (many LLM calls) |
| Reflexion | Self-evaluate, retry on failure | Learns from mistakes | Multiple attempts = slow |
| LATS | Language Agent Tree Search | Systematic exploration | Most complex, highest cost |
Tools extend agent capabilities beyond text generation. They enable agents to search the web, execute code, query databases, call APIs, and interact with files.
# ── Code Execution Tool ──
@tool
def execute_python(code: str) -> str:
"""Execute Python code safely. Returns stdout or error."""
import subprocess
result = subprocess.run(
["python3", "-c", code],
capture_output=True, text=True, timeout=30,
)
return result.stdout if result.returncode == 0 else result.stderr
# ── API Call Tool ──
@tool
def call_api(url: str, method: str = "GET", body: dict = None) -> str:
"""Make an HTTP API call. Returns response JSON."""
import requests
if method == "GET":
resp = requests.get(url, params=body, timeout=10)
else:
resp = requests.post(url, json=body, timeout=10)
return resp.text[:2000]
# ── Database Query Tool ──
@tool
def query_database(sql: str) -> str:
"""Execute a read-only SQL query on the database."""
import sqlite3
conn = sqlite3.connect("app.db")
# Safety: only allow SELECT
if not sql.strip().upper().startswith("SELECT"):
return "Error: Only SELECT queries are allowed."
cursor = conn.execute(sql)
rows = cursor.fetchmany(50)
conn.close()
return str(rows)
# ── File System Tool (with safety) ──
@tool
def read_file(filepath: str) -> str:
"""Read contents of a file. Restricted to /workspace/ directory."""
import os
safe_dir = "/workspace/"
real_path = os.path.realpath(filepath)
if not real_path.startswith(safe_dir):
return "Error: Access denied. Only /workspace/ is allowed."
with open(real_path, "r") as f:
return f.read()[:5000]Multi-agent systems use multiple specialized agents that collaborate to solve complex tasks. Each agent has a specific role and expertise.
# ── Multi-Agent with Supervisor Pattern ──
class Agent:
def __init__(self, name: str, role: str, tools: list):
self.name = name
self.role = role
self.tools = tools
def execute(self, task: str, context: str = "") -> str:
messages = [
{"role": "system", "content": f"You are {self.name}. {self.role}"},
{"role": "user", "content": f"Context: {context}\n\nTask: {task}"},
]
response = client.chat.completions.create(
model="gpt-4o", messages=messages, tools=self.tools,
)
return response.choices[0].message.content
# Define specialized agents
researcher = Agent(
"Researcher",
"Search the web and gather factual information. Be thorough.",
[search_tool],
)
writer = Agent(
"Writer",
"Write clear, engaging content based on research. Use markdown.",
[],
)
reviewer = Agent(
"Reviewer",
"Review content for accuracy, clarity, and grammar. Provide specific feedback.",
[],
)
# Supervisor orchestrates
def supervisor(task: str):
# Step 1: Research
research = researcher.execute(f"Research: {task}")
# Step 2: Write
draft = writer.execute(f"Write an article about: {task}", context=research)
# Step 3: Review
review = reviewer.execute(f"Review this article:\n{draft}")
# Step 4: Revise
final = writer.execute(f"Revise based on review:\n{review}", context=draft)
return final| Pattern | Description | Best For |
|---|---|---|
| Supervisor | Central agent delegates to workers | Research + writing + review pipelines |
| Hierarchical | Multi-level delegation tree | Complex organizations, nested tasks |
| Peer-to-Peer | Agents collaborate as equals | Debate, brainstorming, peer review |
| Blackboard | Shared state, agents read/write | Multi-domain problem solving |
| Auction | Agents bid for task execution | Resource allocation, task assignment |
Memory systems enable agents to maintain context across interactions, remember past experiences, and access relevant knowledge.
| Type | Storage | Retention | Use Case |
|---|---|---|---|
| Short-term (Buffer) | In-memory message list | Current conversation | Multi-turn dialogue, reasoning chain |
| Long-term (Vector) | Embeddings in vector DB | Permanent | Past conversations, knowledge base |
| Episodic | Key events/experiences | Permanent | Learning from past mistakes/successes |
| Procedural | Learned skills/workflows | Permanent | Improved task execution over time |
| Reflection | Summarized insights | Periodic | Self-improvement, strategy refinement |
from langchain_openai import OpenAIEmbeddings
from langchain_community.vectorstores import Chroma
# ── Long-term Memory with Vector Store ──
embeddings = OpenAIEmbeddings(model="text-embedding-3-small")
memory_store = Chroma(embedding_function=embeddings,
persist_directory="./agent_memory")
def store_memory(text: str, metadata: dict = None):
memory_store.add_texts([text], metadatas=[metadata or {}])
def recall_memory(query: str, k: int = 5) -> list:
results = memory_store.similarity_search(query, k=k)
return [doc.page_content for doc in results]
# ── Reflection Memory ──
def reflect_and_store(task: str, result: str, success: bool):
reflection = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content":
f"Task: {task}\nResult: {result}\nSuccess: {success}\n"
f"Write a brief lesson learned (1-2 sentences)."}]
).choices[0].message.content
store_memory(reflection, metadata={"type": "reflection", "task": task})| Metric | What It Measures | How to Evaluate |
|---|---|---|
| Task Completion Rate | Percentage of tasks completed successfully | Binary: did agent achieve the goal? |
| Tool Accuracy | Correct tool selection and arguments | Manual review of tool call logs |
| Reasoning Quality | Correctness of intermediate reasoning | Expert review of think steps |
| Efficiency | Number of steps / LLM calls used | Compare to optimal path length |
| Safety | No harmful or unauthorized actions | Red team testing, boundary tests |
| Latency | End-to-end completion time | Measure total time for standard tasks |
| Cost | Total API tokens/cost per task | Track token usage across all LLM calls |