Agent Planning & Reasoning | AI Agents Course

🎓 Complete all tutorials to earn your Free AI Agents Certificate
Shareable on LinkedIn • Verified by AITutorials.site • No signup fee

How Do Agents Think?

The breakthrough insight of modern agents is simple: Let AI models think step-by-step. Instead of demanding immediate answers, agents break problems into reasoning steps, explore options, and reconsider decisions.

This module teaches you the reasoning patterns that make agents intelligent.

Key Insight: Agent performance improves dramatically when they can reason explicitly. Giving an agent permission to "think aloud" leads to better decisions and error recovery.

Chain-of-Thought Reasoning

Chain-of-Thought (CoT) is the foundational reasoning pattern. Instead of jumping to an answer, the model articulates its thinking:

❌ Without Chain-of-Thought:

User: "If a store sells apples at $2 each, and I buy 5, paying with $20, how much change?"

Agent: $10

✅ With Chain-of-Thought:

User: "If a store sells apples at $2 each, and I buy 5, paying with $20, how much change?"

Agent Thinks:

1. Price per apple: $2

2. Number of apples: 5

3. Total cost: $2 × 5 = $10

4. Amount paid: $20

5. Change: $20 - $10 = $10

Answer: $10

The answer is the same, but the step-by-step reasoning makes it:

More accurate (catches mistakes)
More transparent (you see the logic)
More adaptable (you can correct reasoning steps)

Implementing Chain-of-Thought

from langchain.llms import OpenAI
from langchain.chains import LLMChain
from langchain.prompts import PromptTemplate

# CoT prompt template
cot_prompt = PromptTemplate(
    input_variables=["question"],
    template="""You are a reasoning assistant.
    
Question: {question}

Let's think through this step by step:
1. First, identify what we know
2. Break down the problem
3. Work through each step
4. Verify the answer

Your reasoning:"""
)

llm = OpenAI(temperature=0.3)
chain = LLMChain(llm=llm, prompt=cot_prompt)

result = chain.run(question="What is 15% of 200?")
print(result)

Result: The agent thinks aloud, showing steps, making errors visible and fixable.

ReAct: Reasoning + Acting

ReAct (Reasoning + Acting) combines thinking with external tool use. The agent alternates between:

💭

Thought: Agent reasons about what to do next

⚙️

Action: Agent uses a tool (search, calculator, API)

👁️

Observation: Agent sees tool output, reflects, and adapts

Example ReAct Loop

Task: "What is the current temperature in San Francisco and the weather forecast?"

Agent Thought: I need to find current temperature and forecast. I should use a weather tool.

Action: get_weather(city="San Francisco")

Observation: {"temp": 72°F, "forecast": "Sunny tomorrow, rain in 3 days"}

Agent Thought: Got it! I have the information needed to answer.

Final Answer: "It's currently 72°F in San Francisco with sunny weather tomorrow."

Implementing ReAct

from langchain.agents import Tool, initialize_agent, AgentType
from langchain.llms import OpenAI
import json

def weather_tool(city: str) -> str:
    """Get weather for a city"""
    # Simulated weather data
    weather_db = {
        "San Francisco": {"temp": 72, "condition": "Sunny"},
        "New York": {"temp": 45, "condition": "Rainy"}
    }
    return json.dumps(weather_db.get(city, {"error": "City not found"}))

# Create tool
tools = [
    Tool(
        name="GetWeather",
        func=weather_tool,
        description="Get current weather for a city"
    )
]

# Initialize ReAct agent
agent = initialize_agent(
    tools,
    OpenAI(temperature=0),
    agent=AgentType.REACT_DOCSTRING,
    verbose=True  # Shows Thought → Action → Observation loop
)

# Agent will think, act, observe, and adapt
result = agent.run("What's the weather in San Francisco?")

ReAct Advantage: By combining thinking with tool use, agents can ground their reasoning in real information and correct themselves when needed.

Tree-of-Thought: Exploring Multiple Paths

Sometimes one reasoning path isn't enough. Tree-of-Thought lets agents explore multiple approaches:

Standard Thinking (Linear):

Start → Step 1 → Step 2 → Step 3 → Answer

Tree-of-Thought (Branching):

Start
├─ Path A: Try approach 1
│  ├─ Step 1A
│  ├─ Step 2A → Dead end
│  └─ Backtrack
├─ Path B: Try approach 2
│  ├─ Step 1B
│  ├─ Step 2B
│  └─ Step 3B → Good result!
└─ Path C: Try approach 3
   └─ Quick rejection
   
Best: Path B → Answer

Tree-of-Thought is powerful for complex problems where exploration matters. The agent:

Generates multiple reasoning paths
Evaluates each path for promise
Explores the most promising paths deeper
Backtracks when paths fail
Returns the best solution found

Example Use Cases

🎯 Problem Solving

Multiple solution approaches for math/logic problems

🖊️ Writing

Draft multiple versions, evaluate, refine best one

🔍 Decision Making

Explore options, compare trade-offs, choose best

Self-Refinement Patterns

Agents improve by reflecting on their own work:

Self-Critique Loop

1. Generate: Agent produces initial answer

2. Critique: Agent reviews its own work, identifies flaws

3. Refine: Agent improves based on critique

4. Verify: Check if refined answer is better

5. Iterate: Repeat until satisfactory

Implementing Self-Refinement

def refine_answer(question: str, initial_answer: str) -> str:
    """Agent refines its answer through self-critique"""
    
    # Step 1: Generate
    answer = llm(question)
    
    # Step 2: Critique
    critique_prompt = f"""
    Question: {question}
    Initial Answer: {answer}
    
    Critique this answer. What's wrong? How could it be better?
    """
    critique = llm(critique_prompt)
    
    # Step 3: Refine
    refine_prompt = f"""
    Question: {question}
    Initial Answer: {answer}
    Critique: {critique}
    
    Based on the critique, provide an improved answer.
    """
    refined_answer = llm(refine_prompt)
    
    return refined_answer

# Example
question = "Explain photosynthesis simply"
refined = refine_answer(question, "Plants make food from sun")
# Agent will critique, then improve its explanation

Hierarchical Planning

Complex tasks need hierarchical decomposition. Agents break big goals into subgoals:

Goal: "Write a blog post about AI agents"

├─ Subgoal 1: Research AI agent applications

├─ Search for use cases

└─ Compile findings

├─ Subgoal 2: Outline the blog post

├─ Define sections

└─ Create heading structure

├─ Subgoal 3: Write draft sections

├─ Introduction

├─ Main content

└─ Conclusion

└─ Subgoal 4: Edit and polish

├─ Grammar check

└─ Final review

Hierarchical planning helps agents:

Break overwhelming tasks into manageable pieces
Track progress toward main goal
Recover gracefully if a subtask fails
Reuse subgoal solutions for similar problems

Implementing Hierarchical Planner

hierarchical_planner.py

from dataclasses import dataclass
from typing import List, Optional
from enum import Enum

class TaskStatus(Enum):
    NOT_STARTED = "not_started"
    IN_PROGRESS = "in_progress"
    COMPLETED = "completed"
    FAILED = "failed"

@dataclass
class Subgoal:
    id: str
    description: str
    status: TaskStatus
    dependencies: List[str]  # IDs of subtasks that must complete first
    subtasks: List['Subgoal']
    result: Optional[str] = None

class HierarchicalPlanner:
    def __init__(self, llm):
        self.llm = llm
        self.plan = None
    
    def create_plan(self, goal: str) -> Subgoal:
        """Decompose goal into hierarchical subgoals"""
        prompt = f"""
Break down this goal into a hierarchical plan:
Goal: {goal}

Create a tree of subgoals. For each subgoal:
1. Give it a clear description
2. List any dependencies (what must finish first)
3. Break it into smaller subtasks if needed

Format as:
Subgoal 1: [description]
  Dependencies: []
  Subtasks:
    - 1.1: [description]
    - 1.2: [description]

Your hierarchical plan:
"""
        
        plan_text = self.llm(prompt)
        plan = self._parse_plan(plan_text)
        self.plan = plan
        return plan
    
    def execute_plan(self, plan: Subgoal) -> str:
        """Execute plan by traversing tree"""
        print(f"🎯 Executing: {plan.description}\n")
        
        # Check dependencies
        for dep_id in plan.dependencies:
            dep = self._find_subgoal(dep_id)
            if dep.status != TaskStatus.COMPLETED:
                print(f"⏸️  Waiting for dependency: {dep.description}")
                return "Dependencies not met"
        
        # Execute subtasks first
        if plan.subtasks:
            for subtask in plan.subtasks:
                result = self.execute_plan(subtask)
                if subtask.status == TaskStatus.FAILED:
                    plan.status = TaskStatus.FAILED
                    return "Subtask failed"
        
        # Execute this task
        plan.status = TaskStatus.IN_PROGRESS
        result = self._execute_task(plan.description)
        
        if "error" in result.lower():
            plan.status = TaskStatus.FAILED
        else:
            plan.status = TaskStatus.COMPLETED
            plan.result = result
        
        print(f"✅ Completed: {plan.description}\n")
        return result
    
    def _execute_task(self, task: str) -> str:
        """Execute a single task"""
        # In production, this would call appropriate tools
        return f"Executed: {task}"
    
    def _parse_plan(self, plan_text: str) -> Subgoal:
        """Parse LLM output into Subgoal tree"""
        # Simplified parsing (production would be more robust)
        return Subgoal(
            id="root",
            description="Main goal",
            status=TaskStatus.NOT_STARTED,
            dependencies=[],
            subtasks=[]
        )
    
    def _find_subgoal(self, goal_id: str) -> Optional[Subgoal]:
        """Find subgoal by ID in tree"""
        # Traverse plan tree to find subgoal
        pass

# Usage
planner = HierarchicalPlanner(llm=openai_llm)
plan = planner.create_plan("Research and write a technical blog post on AI agents")
result = planner.execute_plan(plan)

Advanced Reasoning Techniques

Beyond the core patterns, several advanced techniques enhance agent reasoning:

1. Few-Shot Reasoning Examples

Show agents examples of good reasoning to guide their thinking:

Few-Shot Reasoning

few_shot_prompt = """
Here are examples of good reasoning:

Example 1:
Q: How many days until Christmas from Oct 15?
Reasoning:
- October has 31 days
- Days left in October: 31 - 15 = 16 days
- November: 30 days
- December: 25 days until Christmas
- Total: 16 + 30 + 25 = 71 days
Answer: 71 days

Example 2:
Q: If I save $50/month, how much in 1 year?
Reasoning:
- Savings per month: $50
- Months in year: 12
- Total: $50 × 12 = $600
Answer: $600

Now solve this:
Q: {user_question}
Reasoning:
"""

2. Socratic Questioning

Agents ask themselves probing questions to deepen reasoning:

Socratic Method

def socratic_reasoning(question: str) -> str:
    """Agent reasons through Socratic self-questioning"""
    
    prompts = [
        f"What is the core problem in: {question}",
        "What do I already know about this?",
        "What assumptions am I making?",
        "What would disprove my current thinking?",
        "What's a simpler version of this problem?",
        "How can I verify my answer?"
    ]
    
    reasoning_chain = []
    for prompt in prompts:
        response = llm(prompt)
        reasoning_chain.append(f"Q: {prompt}\nA: {response}\n")
    
    # Final answer based on complete reasoning
    final_prompt = f"""
Based on this reasoning chain:
{''.join(reasoning_chain)}

Original question: {question}
Your final answer:
"""
    return llm(final_prompt)

3. Counterfactual Reasoning

Agents consider "what if" scenarios to test their reasoning:

Example: "What if my initial assumption was wrong? Would my conclusion still hold? What alternative explanations exist?"

4. Analogical Reasoning

Agents draw parallels to similar problems they've solved:

Analogical Reasoning

analogical_prompt = f"""
Problem: {current_problem}

This problem is similar to: [identify analogous problem]

In that problem, we solved it by: [describe solution approach]

Applying same logic here:
Step 1: [adapt approach]
Step 2: [continue adaptation]
...

Solution: [final answer]
"""

Building a Production ReAct Agent

Let's build a complete, production-quality ReAct agent with error handling, logging, and retry logic:

production_react_agent.py

import openai
from typing import List, Dict, Callable
import json
import logging
from datetime import datetime

logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)

class ProductionReActAgent:
    def __init__(self, api_key: str, tools: Dict[str, Callable]):
        self.client = openai.OpenAI(api_key=api_key)
        self.tools = tools
        self.max_iterations = 15
        self.max_retries = 3
        self.conversation_history = []
    
    def run(self, task: str) -> Dict:
        """Execute task using ReAct pattern"""
        logger.info(f"Starting task: {task}")
        
        observation = f"Task: {task}"
        iteration = 0
        
        while iteration < self.max_iterations:
            iteration += 1
            logger.info(f"\n--- Iteration {iteration} ---")
            
            # THOUGHT: Reason about what to do
            thought = self._think(observation, task)
            logger.info(f"Thought: {thought}")
            
            # Check if task complete
            if "FINAL ANSWER:" in thought:
                answer = thought.split("FINAL ANSWER:")[1].strip()
                return {
                    "status": "success",
                    "answer": answer,
                    "iterations": iteration,
                    "history": self.conversation_history
                }
            
            # ACTION: Parse and execute action
            action, params = self._parse_action(thought)
            logger.info(f"Action: {action}({params})")
            
            # Execute with retries
            observation = self._execute_with_retry(action, params)
            logger.info(f"Observation: {observation[:200]}...")
            
            # Store in history
            self.conversation_history.append({
                "iteration": iteration,
                "thought": thought,
                "action": action,
                "observation": observation,
                "timestamp": datetime.now().isoformat()
            })
        
        return {
            "status": "incomplete",
            "reason": "max_iterations_reached",
            "iterations": iteration,
            "history": self.conversation_history
        }
    
    def _think(self, observation: str, task: str) -> str:
        """ReAct thinking step"""
        context = self._build_context(observation, task)
        
        system_prompt = """You are a ReAct agent. For each turn:

1. THOUGHT: Reason about the observation and what to do next
2. ACTION: Choose a tool and parameters
3. Wait for OBSERVATION

When you have the final answer, respond with:
FINAL ANSWER: [your answer]

Available tools:
""" + self._format_tool_descriptions()
        
        response = self.client.chat.completions.create(
            model="gpt-4",
            messages=[
                {"role": "system", "content": system_prompt},
                {"role": "user", "content": context}
            ],
            temperature=0.1
        )
        
        return response.choices[0].message.content
    
    def _build_context(self, observation: str, task: str) -> str:
        """Build context from history and current observation"""
        context = f"Task: {task}\n\n"
        
        # Add recent history
        if self.conversation_history:
            context += "Recent history:\n"
            for entry in self.conversation_history[-3:]:
                context += f"Thought: {entry['thought'][:100]}...\n"
                context += f"Action: {entry['action']}\n"
                context += f"Observation: {entry['observation'][:100]}...\n\n"
        
        context += f"Current Observation: {observation}\n\n"
        context += "Your thought and next action:"
        
        return context
    
    def _parse_action(self, thought: str) -> tuple:
        """Parse action from thought"""
        # Simple parsing (production would use regex or structured output)
        if "ACTION:" in thought:
            action_line = thought.split("ACTION:")[1].split("\n")[0].strip()
            
            # Extract tool name and parameters
            if "(" in action_line:
                tool_name = action_line.split("(")[0].strip()
                params_str = action_line.split("(")[1].split(")")[0]
                
                # Parse parameters
                try:
                    params = json.loads("{" + params_str + "}")
                except:
                    params = {"query": params_str}
                
                return (tool_name, params)
        
        return ("continue", {})
    
    def _execute_with_retry(self, action: str, params: Dict, 
                           retries: int = 0) -> str:
        """Execute action with retry logic"""
        try:
            if action not in self.tools:
                return f"Error: Unknown tool '{action}'. Available: {list(self.tools.keys())}"
            
            result = self.tools[action](**params)
            return str(result)
        
        except Exception as e:
            logger.error(f"Tool execution error: {str(e)}")
            
            if retries < self.max_retries:
                logger.info(f"Retrying ({retries + 1}/{self.max_retries})...")
                return self._execute_with_retry(action, params, retries + 1)
            
            return f"Error after {self.max_retries} retries: {str(e)}"
    
    def _format_tool_descriptions(self) -> str:
        """Format tool descriptions for prompt"""
        descriptions = []
        for name, func in self.tools.items():
            doc = func.__doc__ or "No description"
            descriptions.append(f"- {name}: {doc}")
        return "\n".join(descriptions)

# Example tools
def search_web(query: str) -> str:
    """Search the web for information"""
    return f"Search results for: {query}"

def calculate(expression: str) -> float:
    """Evaluate a mathematical expression"""
    return eval(expression)  # Use safely in production!

def send_email(to: str, subject: str, body: str) -> str:
    """Send an email"""
    return f"Email sent to {to}"

# Usage
agent = ProductionReActAgent(
    api_key="your-key",
    tools={
        "search": search_web,
        "calculate": calculate,
        "email": send_email
    }
)

result = agent.run("What's 15% of the population of Tokyo? Email the answer to research@example.com")
print(json.dumps(result, indent=2))

Production Features:

Comprehensive logging of all decisions
Retry logic for failed tool calls
Conversation history for context
Graceful error handling
Max iteration safety limit
Structured output with full history

Memory-Augmented Reasoning

Advanced agents use memory to improve reasoning over time:

Episodic Memory

Store past reasoning episodes and retrieve similar ones:

Episodic Memory System

import chromadb
from sentence_transformers import SentenceTransformer

class EpisodicMemory:
    def __init__(self):
        self.encoder = SentenceTransformer('all-MiniLM-L6-v2')
        self.client = chromadb.Client()
        self.collection = self.client.create_collection("reasoning_episodes")
    
    def store_episode(self, task: str, reasoning: str, outcome: str):
        """Store a reasoning episode"""
        episode = {
            "task": task,
            "reasoning": reasoning,
            "outcome": outcome,
            "timestamp": datetime.now().isoformat()
        }
        
        # Embed and store
        embedding = self.encoder.encode(task)
        self.collection.add(
            embeddings=[embedding.tolist()],
            documents=[json.dumps(episode)],
            ids=[f"episode_{datetime.now().timestamp()}"]
        )
    
    def recall_similar(self, current_task: str, n: int = 3) -> List[Dict]:
        """Retrieve similar past episodes"""
        query_embedding = self.encoder.encode(current_task)
        
        results = self.collection.query(
            query_embeddings=[query_embedding.tolist()],
            n_results=n
        )
        
        episodes = [json.loads(doc) for doc in results['documents'][0]]
        return episodes
    
    def reason_with_memory(self, task: str) -> str:
        """Use past episodes to inform current reasoning"""
        similar_episodes = self.recall_similar(task)
        
        memory_context = "Similar problems I've solved:\n"
        for i, episode in enumerate(similar_episodes, 1):
            memory_context += f"\n{i}. Task: {episode['task']}\n"
            memory_context += f"   Approach: {episode['reasoning'][:200]}...\n"
            memory_context += f"   Outcome: {episode['outcome']}\n"
        
        prompt = f"""
{memory_context}

Current task: {task}

Based on past experience, how should I approach this?
"""
        return llm(prompt)

# Usage
memory = EpisodicMemory()

# Store past reasoning
memory.store_episode(
    task="Calculate compound interest",
    reasoning="Used formula: A = P(1+r)^t, broke down each variable...",
    outcome="Success - correct calculation"
)

# Later, use memory to help with similar task
approach = memory.reason_with_memory("Calculate loan repayment amount")
# Agent sees similar past tasks and adapts their approach

Learning from Mistakes

Agents that remember failed approaches avoid repeating mistakes:

Mistake Memory

class MistakeMemory:
    def __init__(self):
        self.mistakes = []
    
    def record_failure(self, action: str, context: str, error: str):
        """Record a failed action"""
        self.mistakes.append({
            "action": action,
            "context": context,
            "error": error,
            "timestamp": datetime.now()
        })
    
    def check_before_action(self, proposed_action: str, context: str) -> tuple:
        """Check if this action failed before in similar context"""
        for mistake in self.mistakes:
            if mistake['action'] == proposed_action:
                # Similar context? Warn agent
                similarity = self._compute_similarity(context, mistake['context'])
                if similarity > 0.7:
                    return (False, f"Warning: This action failed before with error: {mistake['error']}")
        
        return (True, "No known issues")
    
    def _compute_similarity(self, text1: str, text2: str) -> float:
        """Compute similarity between contexts"""
        # Use embedding similarity in production
        return 0.5

Meta-Reasoning: Thinking About Thinking

Advanced agents can reason about their own reasoning process:

Meta-Reasoning Questions:

"Am I using the right reasoning strategy?"
"Is my confidence appropriate for this conclusion?"
"Should I explore more or commit to an answer?"
"What are the risks of being wrong?"

Meta-Reasoning Agent

class MetaReasoningAgent:
    def __init__(self, llm):
        self.llm = llm
    
    def reason_with_meta_awareness(self, problem: str) -> Dict:
        """Reason while monitoring reasoning quality"""
        
        # Step 1: Initial reasoning
        initial_reasoning = self.llm(f"Solve: {problem}")
        
        # Step 2: Meta-reasoning - evaluate the reasoning
        meta_prompt = f"""
Problem: {problem}
My reasoning: {initial_reasoning}

Meta-questions:
1. Is this reasoning strategy appropriate for this problem?
2. What's my confidence level (0-100)?
3. What could I be missing?
4. Should I use a different approach?
5. What are alternative solutions?

Your meta-analysis:
"""
        meta_analysis = self.llm(meta_prompt)
        
        # Step 3: Decide if reasoning is good enough
        if "low confidence" in meta_analysis.lower() or "different approach" in meta_analysis.lower():
            # Try alternative approach
            alternative_prompt = f"""
Problem: {problem}
First attempt: {initial_reasoning}
Issues identified: {meta_analysis}

Try a completely different approach:
"""
            alternative_reasoning = self.llm(alternative_prompt)
            
            return {
                "answer": alternative_reasoning,
                "confidence": "improved",
                "meta_analysis": meta_analysis,
                "attempts": 2
            }
        
        return {
            "answer": initial_reasoning,
            "confidence": "high",
            "meta_analysis": meta_analysis,
            "attempts": 1
        }

Reasoning Under Uncertainty

Real-world reasoning often involves incomplete information. Agents must handle uncertainty:

Confidence Scoring

Agents should express confidence in their reasoning:

Confidence-Aware Reasoning

def reason_with_confidence(question: str) -> Dict:
    """Reason and provide confidence score"""
    
    prompt = f"""
Question: {question}

Provide your reasoning and a confidence score (0-100).

Format:
REASONING: [your step-by-step thinking]
ANSWER: [your answer]
CONFIDENCE: [0-100]
UNCERTAINTY SOURCES: [what could make this wrong?]
"""
    
    response = llm(prompt)
    
    # Parse response
    reasoning = extract_section(response, "REASONING")
    answer = extract_section(response, "ANSWER")
    confidence = int(extract_section(response, "CONFIDENCE"))
    uncertainties = extract_section(response, "UNCERTAINTY SOURCES")
    
    return {
        "answer": answer,
        "reasoning": reasoning,
        "confidence": confidence,
        "uncertainties": uncertainties,
        "should_human_review": confidence < 70
    }

Probabilistic Reasoning

For decisions with uncertain outcomes, agents can reason probabilistically:

Example: "There's a 70% chance approach A works and 30% for approach B. Approach A is lower cost. Expected value suggests trying A first, with B as fallback."

🎯 Practical Exercise: Build a Reasoning Agent

Create a research agent that uses multiple reasoning patterns:

Challenge: Build an agent that researches a topic using:

Hierarchical Planning: Break research into subgoals
ReAct: Search web, read sources, synthesize
Self-Refinement: Critique and improve draft
Confidence Scoring: Rate answer quality

Starter Code: Combine the patterns we've learned to build a complete research agent that can handle complex queries like "Compare the energy efficiency of solar vs nuclear power, considering cost, environmental impact, and scalability."

Choosing the Right Pattern

Different problems benefit from different reasoning patterns:

Pattern	Best For	Example
Chain-of-Thought	Linear reasoning, math, logic	Calculate tax on a purchase
ReAct	Tasks requiring tools/APIs	Look up current stock prices, then analyze
Tree-of-Thought	Complex exploration, creative tasks	Design multiple solutions, pick best
Self-Refinement	Quality improvement, iterative work	Writing, code generation, analysis
Hierarchical	Large, multi-step projects	Complete a full research report

Common Reasoning Pitfalls and Solutions

Even with good reasoning patterns, agents can fall into traps:

Pitfall	What Happens	Solution
Reasoning Loops	Agent repeats same reasoning, gets stuck	Detect repeated states, force new approach after N iterations
Over-confidence	Agent commits to wrong answer confidently	Force confidence scoring, verify high-confidence claims
Premature Commitment	Agent stops reasoning too early	Require minimum reasoning steps, use verification phase
Reasoning Shortcuts	Agent skips important steps	Enforce step-by-step format, reject incomplete reasoning
Tool Over-reliance	Agent uses tools unnecessarily	Teach when to reason vs. when to use tools

Loop Detection

Detecting Reasoning Loops

class LoopDetector:
    def __init__(self, window=5, threshold=0.8):
        self.recent_states = []
        self.window = window
        self.threshold = threshold
    
    def check_loop(self, current_state: str) -> bool:
        """Detect if agent is in a reasoning loop"""
        # Add current state
        self.recent_states.append(current_state)
        
        # Keep only recent window
        if len(self.recent_states) > self.window:
            self.recent_states.pop(0)
        
        # Check for repetition
        if len(self.recent_states) >= 3:
            # Compare current to previous states
            similarities = []
            for prev_state in self.recent_states[:-1]:
                sim = self._similarity(current_state, prev_state)
                similarities.append(sim)
            
            # If highly similar to recent states, likely a loop
            if max(similarities) > self.threshold:
                return True
        
        return False
    
    def _similarity(self, s1: str, s2: str) -> float:
        """Compute similarity between reasoning states"""
        # Simple implementation - use embeddings in production
        s1_words = set(s1.lower().split())
        s2_words = set(s2.lower().split())
        intersection = s1_words & s2_words
        union = s1_words | s2_words
        return len(intersection) / len(union) if union else 0

# Usage in agent
loop_detector = LoopDetector()

for iteration in range(max_iterations):
    reasoning = agent.think(observation)
    
    if loop_detector.check_loop(reasoning):
        logger.warning("Loop detected! Forcing new approach")
        reasoning = agent.think_differently(observation)
    
    # Continue with reasoning...

Optimizing Reasoning Performance

Production agents need to balance reasoning quality with speed and cost:

1. Adaptive Reasoning Depth

Use simple reasoning for easy problems, deep reasoning for hard ones:

Adaptive Reasoning

class AdaptiveReasoningAgent:
    def reason(self, problem: str) -> str:
        # Quick assessment of problem difficulty
        difficulty = self._assess_difficulty(problem)
        
        if difficulty < 3:
            # Simple problem - direct answer
            return self.simple_reasoning(problem)
        elif difficulty < 7:
            # Medium - Chain-of-Thought
            return self.cot_reasoning(problem)
        else:
            # Hard - Tree-of-Thought with exploration
            return self.tot_reasoning(problem)
    
    def _assess_difficulty(self, problem: str) -> int:
        """Rate problem difficulty 1-10"""
        assessment_prompt = f"""
Rate problem difficulty (1-10):
Problem: {problem}

Consider:
- Number of steps required
- Domain knowledge needed
- Ambiguity level
- Need for external information

Difficulty (1-10):
"""
        score = llm(assessment_prompt)
        return int(score.strip())

2. Caching Reasoning Patterns

Cache successful reasoning approaches for similar problems:

Reasoning Cache

import hashlib
from functools import lru_cache

class ReasoningCache:
    def __init__(self):
        self.cache = {}
    
    def get_cached_reasoning(self, problem: str) -> Optional[str]:
        """Check if similar problem was solved before"""
        problem_hash = self._hash_problem(problem)
        
        if problem_hash in self.cache:
            cached = self.cache[problem_hash]
            # Verify cache freshness
            if (datetime.now() - cached['timestamp']).days < 7:
                return cached['reasoning_template']
        
        return None
    
    def cache_reasoning(self, problem: str, reasoning: str):
        """Cache successful reasoning approach"""
        problem_hash = self._hash_problem(problem)
        self.cache[problem_hash] = {
            'reasoning_template': reasoning,
            'timestamp': datetime.now()
        }
    
    def _hash_problem(self, problem: str) -> str:
        """Create hash of problem for cache key"""
        # Normalize problem
        normalized = problem.lower().strip()
        return hashlib.md5(normalized.encode()).hexdigest()

3. Parallel Reasoning Paths

For complex problems, explore multiple reasoning paths in parallel:

Parallel Reasoning

import asyncio
from concurrent.futures import ThreadPoolExecutor

class ParallelReasoningAgent:
    def __init__(self, llm):
        self.llm = llm
        self.executor = ThreadPoolExecutor(max_workers=3)
    
    async def reason_parallel(self, problem: str) -> Dict:
        """Try multiple reasoning approaches simultaneously"""
        
        approaches = [
            self._approach_analytical,
            self._approach_analogical,
            self._approach_creative
        ]
        
        # Run all approaches in parallel
        tasks = [
            asyncio.create_task(self._reason_async(approach, problem))
            for approach in approaches
        ]
        
        results = await asyncio.gather(*tasks)
        
        # Evaluate and pick best result
        best = self._select_best(results)
        
        return {
            "answer": best['answer'],
            "approach": best['approach'],
            "all_results": results
        }
    
    async def _reason_async(self, approach, problem):
        """Run reasoning approach asynchronously"""
        loop = asyncio.get_event_loop()
        return await loop.run_in_executor(
            self.executor,
            approach,
            problem
        )
    
    def _select_best(self, results: List[Dict]) -> Dict:
        """Select best reasoning result"""
        # Score each result
        scored = []
        for result in results:
            score = self._score_reasoning(result)
            scored.append((score, result))
        
        # Return highest scoring
        return max(scored, key=lambda x: x[0])[1]

Debugging Agent Reasoning

When agents reason incorrectly, you need tools to diagnose the problem:

Reasoning Trace Visualization

Reasoning Debugger

class ReasoningDebugger:
    def __init__(self):
        self.trace = []
    
    def log_reasoning_step(self, step: Dict):
        """Log each reasoning step"""
        self.trace.append({
            **step,
            'timestamp': datetime.now(),
            'stack_depth': len(self.trace)
        })
    
    def visualize_trace(self):
        """Create visual representation of reasoning"""
        print("\n" + "="*60)
        print("REASONING TRACE")
        print("="*60 + "\n")
        
        for i, step in enumerate(self.trace, 1):
            indent = "  " * step['stack_depth']
            print(f"{indent}{i}. {step['type']}")
            print(f"{indent}   Input: {step['input'][:50]}...")
            print(f"{indent}   Output: {step['output'][:50]}...")
            print(f"{indent}   Duration: {step.get('duration', 0):.2f}s")
            print()
    
    def export_trace(self, filename: str):
        """Export trace for analysis"""
        import json
        with open(filename, 'w') as f:
            json.dump(self.trace, f, indent=2, default=str)
    
    def find_error_point(self) -> Optional[int]:
        """Identify where reasoning went wrong"""
        for i, step in enumerate(self.trace):
            if 'error' in step.get('output', '').lower():
                return i
        return None

# Usage
debugger = ReasoningDebugger()

# In agent loop
debugger.log_reasoning_step({
    'type': 'thought',
    'input': observation,
    'output': reasoning,
    'duration': 0.5
})

# After completion
debugger.visualize_trace()
error_step = debugger.find_error_point()
if error_step:
    print(f"Error detected at step {error_step}")

Reasoning Validation

Automatically verify reasoning correctness:

Reasoning Validator

class ReasoningValidator:
    def validate(self, reasoning: str, answer: str) -> Dict:
        """Validate reasoning quality"""
        checks = {
            'has_steps': self._check_steps(reasoning),
            'logical_flow': self._check_logic(reasoning),
            'evidence_based': self._check_evidence(reasoning),
            'conclusion_follows': self._check_conclusion(reasoning, answer),
            'no_contradictions': self._check_consistency(reasoning)
        }
        
        score = sum(checks.values()) / len(checks)
        
        return {
            'valid': score > 0.7,
            'score': score,
            'checks': checks,
            'issues': [k for k, v in checks.items() if not v]
        }
    
    def _check_steps(self, reasoning: str) -> bool:
        """Verify reasoning has clear steps"""
        # Look for numbered steps or logical progression
        indicators = ['step', '1.', '2.', 'first', 'then', 'next']
        return any(ind in reasoning.lower() for ind in indicators)
    
    def _check_logic(self, reasoning: str) -> bool:
        """Check for logical flow"""
        # Use LLM to verify logic
        prompt = f"""
Analyze this reasoning for logical flow:
{reasoning}

Is the logic sound? (Yes/No):
"""
        response = llm(prompt).strip().lower()
        return 'yes' in response

Reasoning Best Practices

Essential practices for production reasoning agents:

1. Always Log Reasoning

Store every reasoning step. When things go wrong, traces are invaluable for debugging.

2. Implement Timeouts

Don't let agents reason forever. Set hard limits: max iterations, max time, max tokens.

3. Verify Critical Decisions

For high-stakes decisions, use multiple reasoning methods and compare results.

4. Build in Uncertainty

Force agents to express confidence. Use low-confidence as trigger for human review.

5. Test Edge Cases

Test reasoning with ambiguous inputs, missing data, contradictory information.

6. Use Smaller Models for Simple Reasoning

Not every decision needs GPT-4. Use GPT-3.5 or Claude Instant for simple reasoning to save cost.

Key Takeaways

Chain-of-Thought: Let agents think step-by-step for more accurate reasoning
ReAct: Combine reasoning with external tools for grounded decisions
Tree-of-Thought: Explore multiple paths for complex problem-solving
Self-Refinement: Agents improve by critiquing their own work
Hierarchical Planning: Break complex goals into subgoals for tractability
Memory-Augmented Reasoning: Use past episodes to inform current decisions
Meta-Reasoning: Agents that think about their thinking are more robust
Adaptive Depth: Match reasoning complexity to problem difficulty
Always validate: Log, debug, and verify reasoning quality in production

Test Your Knowledge

Q1: What is the main benefit of Chain-of-Thought (CoT) reasoning?

It makes models run faster

It reduces token usage

It improves accuracy by breaking down problems into explicit reasoning steps

It eliminates the need for training data

Q2: What does the "ReAct" pattern stand for?

Retrieve and Compile

Reasoning and Acting

Reverse Action

Rapid Execution and Analysis

Q3: In the ReAct loop, what comes after "Thought"?

Final Answer

Next Thought

Error Message

Action and Observation

Q4: What is the key advantage of Tree-of-Thought (ToT) over Chain-of-Thought?

It explores multiple reasoning paths simultaneously and backtracks if needed

It uses less memory

It works without an LLM

It eliminates all reasoning errors

Q5: When should you use hierarchical planning in agents?

For simple, single-step tasks

Only when training new models

For complex tasks that benefit from high-level strategy and detailed execution

Never, it's deprecated