AI Agents Cheatsheet Cheatsheet

🤖

Agent Fundamentals

Core Concepts

AI Agents are autonomous systems that use LLMs to reason, plan, and execute actions using external tools. They go beyond simple chat by maintaining state and iterating until a goal is achieved.

agent_basics.py

import openai
import json

client = openai.OpenAI()

# ── Simple ReAct Agent Loop ──
def run_agent(task: str, max_iterations: int = 10):
    messages = [
        {"role": "system", "content": """You are a helpful agent.
Use the available tools to complete tasks.
Think step by step. After each tool use, analyze the result
and decide the next action. Respond with FINAL ANSWER when done."""},
        {"role": "user", "content": task},
    ]

    tools = [
        {"type": "function", "function": {
            "name": "search_web",
            "description": "Search the web for information",
            "parameters": {"type": "object", "properties": {
                "query": {"type": "string"}},
                "required": ["query"]}}},
        {"type": "function", "function": {
            "name": "calculate",
            "description": "Evaluate a math expression",
            "parameters": {"type": "object", "properties": {
                "expression": {"type": "string"}},
                "required": ["expression"]}}},
    ]

    for i in range(max_iterations):
        response = client.chat.completions.create(
            model="gpt-4o", messages=messages, tools=tools, tool_choice="auto",
        )
        msg = response.choices[0].message
        messages.append(msg)

        if not msg.tool_calls:
            return msg.content  # Final answer

        for tool_call in msg.tool_calls:
            args = json.loads(tool_call.function.arguments)
            if tool_call.function.name == "search_web":
                result = search_web(args["query"])
            elif tool_call.function.name == "calculate":
                result = str(eval(args["expression"]))

            messages.append({"role": "tool", "tool_call_id": tool_call.id,
                             "content": result})

result = run_agent("What is 15% of the population of Tokyo?")

Agent Architecture Components

Component	Purpose	Implementation
LLM (Brain)	Reason, plan, decide next action	GPT-4o, Claude, LLaMA
Tools (Hands)	Interact with external world	Search, Calculator, API calls, Code execution
Memory (Recall)	Store conversation and context	Short-term (buffer), Long-term (vector store)
Planning (Strategy)	Break tasks into steps	ReAct, Plan-and-Execute, Tree-of-Thought
State	Track progress and intermediate results	Messages list, scratchpad, JSON state
Guardrails	Safety constraints and validation	Input/output filters, permission checks

📋

Planning & Reasoning

Task Decomposition

Planning enables agents to break complex tasks into manageable steps, reason about dependencies, and handle failures gracefully.

planning_patterns.py

# ── Plan-and-Execute Pattern ──
plan_prompt = """Given a task, create a step-by-step plan.
For each step, specify:
- description: what to do
- tool: which tool to use (or "think")
- dependencies: which steps must complete first
- expected_output: what the step produces

Task: {task}

Respond as JSON array of steps."""

def plan_and_execute(task: str):
    # Phase 1: Plan
    plan_response = client.chat.completions.create(
        model="gpt-4o",
        messages=[{"role": "user", "content": plan_prompt.format(task=task)}],
        response_format={"type": "json_object"},
    )
    plan = json.loads(plan_response.choices[0].message.content)

    # Phase 2: Execute each step
    results = {}
    for step in plan["steps"]:
        deps = step.get("dependencies", [])
        # Wait for dependencies
        for dep in deps:
            if dep not in results:
                raise Exception(f"Missing dependency: {dep}")

        # Execute step
        if step["tool"] == "think":
            results[step["id"]] = "Analyzed"
        elif step["tool"] == "search":
            results[step["id"]] = search_web(step["description"])
        else:
            results[step["id"]] = execute_tool(step)

    return results

# ── Tree-of-Thought (ToT) ──
tot_prompt = """Solve this problem by exploring multiple reasoning paths.
For each path:
1. Generate a thought/step
2. Evaluate its promise (high/medium/low)
3. Continue the most promising path
4. Backtrack if stuck

Problem: {problem}"""

Planning Patterns Comparison

Pattern	Approach	Strengths	Weaknesses
ReAct	Interleave think/act in one loop	Simple, flexible, widely used	No global planning, may loop
Plan-then-Execute	Generate full plan, then execute	Efficient, clear steps	Rigid, cannot adapt mid-plan
Tree-of-Thought	Explore multiple reasoning paths	Handles complex reasoning	Expensive (many LLM calls)
Reflexion	Self-evaluate, retry on failure	Learns from mistakes	Multiple attempts = slow
LATS	Language Agent Tree Search	Systematic exploration	Most complex, highest cost

🔧

Tool Use & Function Calling

External Actions

Tools extend agent capabilities beyond text generation. They enable agents to search the web, execute code, query databases, call APIs, and interact with files.

tool_patterns.py

# ── Code Execution Tool ──
@tool
def execute_python(code: str) -> str:
    """Execute Python code safely. Returns stdout or error."""
    import subprocess
    result = subprocess.run(
        ["python3", "-c", code],
        capture_output=True, text=True, timeout=30,
    )
    return result.stdout if result.returncode == 0 else result.stderr

# ── API Call Tool ──
@tool
def call_api(url: str, method: str = "GET", body: dict = None) -> str:
    """Make an HTTP API call. Returns response JSON."""
    import requests
    if method == "GET":
        resp = requests.get(url, params=body, timeout=10)
    else:
        resp = requests.post(url, json=body, timeout=10)
    return resp.text[:2000]

# ── Database Query Tool ──
@tool
def query_database(sql: str) -> str:
    """Execute a read-only SQL query on the database."""
    import sqlite3
    conn = sqlite3.connect("app.db")
    # Safety: only allow SELECT
    if not sql.strip().upper().startswith("SELECT"):
        return "Error: Only SELECT queries are allowed."
    cursor = conn.execute(sql)
    rows = cursor.fetchmany(50)
    conn.close()
    return str(rows)

# ── File System Tool (with safety) ──
@tool
def read_file(filepath: str) -> str:
    """Read contents of a file. Restricted to /workspace/ directory."""
    import os
    safe_dir = "/workspace/"
    real_path = os.path.realpath(filepath)
    if not real_path.startswith(safe_dir):
        return "Error: Access denied. Only /workspace/ is allowed."
    with open(real_path, "r") as f:
        return f.read()[:5000]

🚫

Safety critical: Never give agents unrestricted access to tools. Always implement: input validation, path traversal prevention, SQL injection prevention, rate limiting, and approval gates for destructive operations (DELETE, PUT, file write).

👥

Multi-Agent Systems

Team Collaboration

Multi-agent systems use multiple specialized agents that collaborate to solve complex tasks. Each agent has a specific role and expertise.

multi_agent.py

# ── Multi-Agent with Supervisor Pattern ──
class Agent:
    def __init__(self, name: str, role: str, tools: list):
        self.name = name
        self.role = role
        self.tools = tools

    def execute(self, task: str, context: str = "") -> str:
        messages = [
            {"role": "system", "content": f"You are {self.name}. {self.role}"},
            {"role": "user", "content": f"Context: {context}\n\nTask: {task}"},
        ]
        response = client.chat.completions.create(
            model="gpt-4o", messages=messages, tools=self.tools,
        )
        return response.choices[0].message.content

# Define specialized agents
researcher = Agent(
    "Researcher",
    "Search the web and gather factual information. Be thorough.",
    [search_tool],
)
writer = Agent(
    "Writer",
    "Write clear, engaging content based on research. Use markdown.",
    [],
)
reviewer = Agent(
    "Reviewer",
    "Review content for accuracy, clarity, and grammar. Provide specific feedback.",
    [],
)

# Supervisor orchestrates
def supervisor(task: str):
    # Step 1: Research
    research = researcher.execute(f"Research: {task}")
    # Step 2: Write
    draft = writer.execute(f"Write an article about: {task}", context=research)
    # Step 3: Review
    review = reviewer.execute(f"Review this article:\n{draft}")
    # Step 4: Revise
    final = writer.execute(f"Revise based on review:\n{review}", context=draft)
    return final

Multi-Agent Patterns

Pattern	Description	Best For
Supervisor	Central agent delegates to workers	Research + writing + review pipelines
Hierarchical	Multi-level delegation tree	Complex organizations, nested tasks
Peer-to-Peer	Agents collaborate as equals	Debate, brainstorming, peer review
Blackboard	Shared state, agents read/write	Multi-domain problem solving
Auction	Agents bid for task execution	Resource allocation, task assignment

🧠

Agent Memory Systems

Context Management

Memory systems enable agents to maintain context across interactions, remember past experiences, and access relevant knowledge.

Memory Types

Type	Storage	Retention	Use Case
Short-term (Buffer)	In-memory message list	Current conversation	Multi-turn dialogue, reasoning chain
Long-term (Vector)	Embeddings in vector DB	Permanent	Past conversations, knowledge base
Episodic	Key events/experiences	Permanent	Learning from past mistakes/successes
Procedural	Learned skills/workflows	Permanent	Improved task execution over time
Reflection	Summarized insights	Periodic	Self-improvement, strategy refinement

agent_memory.py

from langchain_openai import OpenAIEmbeddings
from langchain_community.vectorstores import Chroma

# ── Long-term Memory with Vector Store ──
embeddings = OpenAIEmbeddings(model="text-embedding-3-small")
memory_store = Chroma(embedding_function=embeddings,
                      persist_directory="./agent_memory")

def store_memory(text: str, metadata: dict = None):
    memory_store.add_texts([text], metadatas=[metadata or {}])

def recall_memory(query: str, k: int = 5) -> list:
    results = memory_store.similarity_search(query, k=k)
    return [doc.page_content for doc in results]

# ── Reflection Memory ──
def reflect_and_store(task: str, result: str, success: bool):
    reflection = client.chat.completions.create(
        model="gpt-4o",
        messages=[{"role": "user", "content":
            f"Task: {task}\nResult: {result}\nSuccess: {success}\n"
            f"Write a brief lesson learned (1-2 sentences)."}]
    ).choices[0].message.content
    store_memory(reflection, metadata={"type": "reflection", "task": task})

📊

Evaluation & Metrics

Quality Assurance

Agent Evaluation Frameworks

Metric	What It Measures	How to Evaluate
Task Completion Rate	Percentage of tasks completed successfully	Binary: did agent achieve the goal?
Tool Accuracy	Correct tool selection and arguments	Manual review of tool call logs
Reasoning Quality	Correctness of intermediate reasoning	Expert review of think steps
Efficiency	Number of steps / LLM calls used	Compare to optimal path length
Safety	No harmful or unauthorized actions	Red team testing, boundary tests
Latency	End-to-end completion time	Measure total time for standard tasks
Cost	Total API tokens/cost per task	Track token usage across all LLM calls

💬

Interview Questions

Top 8

Q1: ReAct vs Plan-and-Execute

AnswerReAct interleaves reasoning (Thought) and action (Action) in a single loop. Simple and flexible but lacks global planning - may get stuck in loops. Plan-and-Execute separates planning from execution: first generates a full plan, then executes steps sequentially. More efficient for complex tasks but rigid - cannot adapt when steps fail. Best practice: use Plan-and-Execute for well-understood tasks, ReAct for exploratory/unknown tasks.

Q2: How to prevent agent loops?

Answer1) Set max_iterations limit (hard stop). 2) Implement cycle detection (track visited states). 3) Use timeout per tool call. 4) Add a "give up" option when stuck. 5) Reflexion pattern: self-evaluate and try alternative approach. 6) Decrease temperature for more deterministic behavior. 7) Add "FINAL ANSWER" instruction to prevent tool calling when done.

Q3: Agent safety best practices

Answer1) Sandboxed code execution (Docker container). 2) Read-only database access. 3) Allowlist of URLs/APIs. 4) Path traversal prevention. 5) Rate limiting per tool. 6) Human-in-the-loop for destructive operations. 7) Input/output content filtering. 8) Audit logging of all actions. 9) Budget limits (max tokens/cost per task). 10) Permission levels for different tool categories.

Q4: Multi-agent vs single agent

AnswerSingle agent: simpler, faster, lower cost. Good for focused tasks with 1-5 tools. Multi-agent: better for complex tasks requiring different expertise (research + writing + code). Enables specialization and parallelism. Cost: multiply by number of agents. Latency: sequential agent calls add up. Use multi-agent when task complexity justifies the overhead.

Q5: How does memory improve agents?

AnswerShort-term memory: maintains conversation context, enables multi-step reasoning. Long-term memory: stores past experiences for future retrieval (RAG-style). Reflection memory: extracts lessons from past failures/successes. Without memory, agents are stateless - each interaction starts fresh. With memory, agents improve over time and can reference prior context.

Q6: Tool design principles

Answer1) Clear descriptions (LLM reads these). 2) Minimal parameters. 3) Strong type hints. 4) Idempotent when possible. 5) Fast execution (<10s ideal). 6) Return concise results (agent context is limited). 7) Handle errors gracefully. 8) Document expected output format. 9) Keep tool count manageable (5-15). 10) Group related tools (search vs search_web)

Q7: LangGraph vs LangChain agents

AnswerLangGraph: graph-based state machine for agents. Nodes = functions, edges = transitions. Supports cycles, branching, persistence. Better for complex workflows. LangChain agents: simpler ReAct loop via AgentExecutor. Good for basic tool-calling agents. Use LangGraph when you need: custom flow control, persistence, human-in-the-loop, or multi-agent coordination.

Q8: Cost optimization for agents

Answer1) Use GPT-4o-mini for routine reasoning, GPT-4o for complex decisions. 2) Cache repeated tool results. 3) Compress conversation history (summarize old turns). 4) Limit tool description length. 5) Use shorter prompts for well-defined tasks. 6) Batch independent operations. 7) Set max_tokens on responses. 8) Implement result caching (same query = cached answer). 9) Monitor cost per task and set budgets.

⏳

Loading cheatsheet...

import openai import json client = openai.OpenAI() # ── Simple ReAct Agent Loop ── def run_agent(task: str, max_iterations: int = 10): messages = [ {"role": "system", "content": """You are a helpful agent. Use the available tools to complete tasks. Think step by step. After each tool use, analyze the result and decide the next action. Respond with FINAL ANSWER when done."""}, {"role": "user", "content": task}, ] tools = [ {"type": "function", "function": { "name": "search_web", "description": "Search the web for information", "parameters": {"type": "object", "properties": { "query": {"type": "string"}}, "required": ["query"]}}}, {"type": "function", "function": { "name": "calculate", "description": "Evaluate a math expression", "parameters": {"type": "object", "properties": { "expression": {"type": "string"}}, "required": ["expression"]}}}, ] for i in range(max_iterations): response = client.chat.completions.create( model="gpt-4o", messages=messages, tools=tools, tool_choice="auto", ) msg = response.choices[0].message messages.append(msg) if not msg.tool_calls: return msg.content # Final answer for tool_call in msg.tool_calls: args = json.loads(tool_call.function.arguments) if tool_call.function.name == "search_web": result = search_web(args["query"]) elif tool_call.function.name == "calculate": result = str(eval(args["expression"])) messages.append({"role": "tool", "tool_call_id": tool_call.id, "content": result}) result = run_agent("What is 15% of the population of Tokyo?")

Component

Purpose

Implementation

LLM (Brain)

Reason, plan, decide next action

GPT-4o, Claude, LLaMA

Tools (Hands)

Interact with external world

Search, Calculator, API calls, Code execution

Memory (Recall)

Store conversation and context

Short-term (buffer), Long-term (vector store)

Planning (Strategy)

Break tasks into steps

ReAct, Plan-and-Execute, Tree-of-Thought

State

Track progress and intermediate results

Messages list, scratchpad, JSON state

Guardrails

Safety constraints and validation

Input/output filters, permission checks

# ── Plan-and-Execute Pattern ── plan_prompt = """Given a task, create a step-by-step plan. For each step, specify: - description: what to do - tool: which tool to use (or "think") - dependencies: which steps must complete first - expected_output: what the step produces Task: {task} Respond as JSON array of steps.""" def plan_and_execute(task: str): # Phase 1: Plan plan_response = client.chat.completions.create( model="gpt-4o", messages=[{"role": "user", "content": plan_prompt.format(task=task)}], response_format={"type": "json_object"}, ) plan = json.loads(plan_response.choices[0].message.content) # Phase 2: Execute each step results = {} for step in plan["steps"]: deps = step.get("dependencies", []) # Wait for dependencies for dep in deps: if dep not in results: raise Exception(f"Missing dependency: {dep}") # Execute step if step["tool"] == "think": results[step["id"]] = "Analyzed" elif step["tool"] == "search": results[step["id"]] = search_web(step["description"]) else: results[step["id"]] = execute_tool(step) return results # ── Tree-of-Thought (ToT) ── tot_prompt = """Solve this problem by exploring multiple reasoning paths. For each path: 1. Generate a thought/step 2. Evaluate its promise (high/medium/low) 3. Continue the most promising path 4. Backtrack if stuck Problem: {problem}"""

Pattern

Approach

Strengths

Weaknesses

ReAct

Interleave think/act in one loop

Simple, flexible, widely used

No global planning, may loop

Plan-then-Execute

Generate full plan, then execute

Efficient, clear steps

Rigid, cannot adapt mid-plan

Tree-of-Thought

Explore multiple reasoning paths

Handles complex reasoning

Expensive (many LLM calls)

Reflexion

Self-evaluate, retry on failure

Learns from mistakes

Multiple attempts = slow

LATS

Language Agent Tree Search

Systematic exploration

Most complex, highest cost

# ── Code Execution Tool ── @tool def execute_python(code: str) -> str: """Execute Python code safely. Returns stdout or error.""" import subprocess result = subprocess.run( ["python3", "-c", code], capture_output=True, text=True, timeout=30, ) return result.stdout if result.returncode == 0 else result.stderr # ── API Call Tool ── @tool def call_api(url: str, method: str = "GET", body: dict = None) -> str: """Make an HTTP API call. Returns response JSON.""" import requests if method == "GET": resp = requests.get(url, params=body, timeout=10) else: resp = requests.post(url, json=body, timeout=10) return resp.text[:2000] # ── Database Query Tool ── @tool def query_database(sql: str) -> str: """Execute a read-only SQL query on the database.""" import sqlite3 conn = sqlite3.connect("app.db") # Safety: only allow SELECT if not sql.strip().upper().startswith("SELECT"): return "Error: Only SELECT queries are allowed." cursor = conn.execute(sql) rows = cursor.fetchmany(50) conn.close() return str(rows) # ── File System Tool (with safety) ── @tool def read_file(filepath: str) -> str: """Read contents of a file. Restricted to /workspace/ directory.""" import os safe_dir = "/workspace/" real_path = os.path.realpath(filepath) if not real_path.startswith(safe_dir): return "Error: Access denied. Only /workspace/ is allowed." with open(real_path, "r") as f: return f.read()[:5000]

# ── Multi-Agent with Supervisor Pattern ── class Agent: def __init__(self, name: str, role: str, tools: list): self.name = name self.role = role self.tools = tools def execute(self, task: str, context: str = "") -> str: messages = [ {"role": "system", "content": f"You are {self.name}. {self.role}"}, {"role": "user", "content": f"Context: {context}\n\nTask: {task}"}, ] response = client.chat.completions.create( model="gpt-4o", messages=messages, tools=self.tools, ) return response.choices[0].message.content # Define specialized agents researcher = Agent( "Researcher", "Search the web and gather factual information. Be thorough.", [search_tool], ) writer = Agent( "Writer", "Write clear, engaging content based on research. Use markdown.", [], ) reviewer = Agent( "Reviewer", "Review content for accuracy, clarity, and grammar. Provide specific feedback.", [], ) # Supervisor orchestrates def supervisor(task: str): # Step 1: Research research = researcher.execute(f"Research: {task}") # Step 2: Write draft = writer.execute(f"Write an article about: {task}", context=research) # Step 3: Review review = reviewer.execute(f"Review this article:\n{draft}") # Step 4: Revise final = writer.execute(f"Revise based on review:\n{review}", context=draft) return final

Pattern

Description

Best For

Supervisor

Central agent delegates to workers

Research + writing + review pipelines

Hierarchical

Multi-level delegation tree

Complex organizations, nested tasks

Peer-to-Peer

Agents collaborate as equals

Debate, brainstorming, peer review

Blackboard

Shared state, agents read/write

Multi-domain problem solving

Auction

Agents bid for task execution

Resource allocation, task assignment

Type

Storage

Retention

Use Case

Short-term (Buffer)

In-memory message list

Current conversation

Multi-turn dialogue, reasoning chain

Long-term (Vector)

Embeddings in vector DB

Permanent

Past conversations, knowledge base

Episodic

Key events/experiences

Permanent

Learning from past mistakes/successes

Procedural

Learned skills/workflows

Permanent

Improved task execution over time

Reflection

Summarized insights

Periodic

Self-improvement, strategy refinement

from langchain_openai import OpenAIEmbeddings from langchain_community.vectorstores import Chroma # ── Long-term Memory with Vector Store ── embeddings = OpenAIEmbeddings(model="text-embedding-3-small") memory_store = Chroma(embedding_function=embeddings, persist_directory="./agent_memory") def store_memory(text: str, metadata: dict = None): memory_store.add_texts([text], metadatas=[metadata or {}]) def recall_memory(query: str, k: int = 5) -> list: results = memory_store.similarity_search(query, k=k) return [doc.page_content for doc in results] # ── Reflection Memory ── def reflect_and_store(task: str, result: str, success: bool): reflection = client.chat.completions.create( model="gpt-4o", messages=[{"role": "user", "content": f"Task: {task}\nResult: {result}\nSuccess: {success}\n" f"Write a brief lesson learned (1-2 sentences)."}] ).choices[0].message.content store_memory(reflection, metadata={"type": "reflection", "task": task})

Metric

What It Measures

How to Evaluate

Task Completion Rate

Percentage of tasks completed successfully

Binary: did agent achieve the goal?

Tool Accuracy

Correct tool selection and arguments

Manual review of tool call logs

Reasoning Quality

Correctness of intermediate reasoning

Expert review of think steps

Efficiency

Number of steps / LLM calls used

Compare to optimal path length

Safety

No harmful or unauthorized actions

Red team testing, boundary tests

Latency

End-to-end completion time

Measure total time for standard tasks

Cost

Total API tokens/cost per task

Track token usage across all LLM calls