π― Project Overview
Research is time-consuming. You search Google, read articles, synthesize information, and write reports. What if an AI agent could do this autonomously? In this project, you'll build an intelligent research assistant that:
- Searches the web using multiple search engines (Google, DuckDuckGo, Wikipedia)
- Reads and extracts content from websites intelligently
- Summarizes findings from multiple sources
- Generates reports with citations and structured information
- Handles errors gracefully when sources are unavailable
- Tracks progress and shows reasoning steps
π Real-World Impact: Companies like Perplexity.ai and Bing Chat use similar architectures. This project demonstrates enterprise-level agent design!
What You'll Build
βββββββββββββββββββββββββββββββββββββββββββββββ
β Research Assistant Agent β
β β
β User Query: "Explain quantum computing" β
ββββββββββββββββββββ¬βββββββββββββββββββββββββββ
β
βΌ
ββββββββββββββββββββββββ
β Planning Module β
β (Break into steps) β
ββββββββββββ¬ββββββββββββ
β
ββββββββββΌβββββββββ
β β β
βΌ βΌ βΌ
βββββββ βββββββ βββββββ
βGoogleβ βWiki β βDuckDβ β Search Tools
ββββ¬βββ ββββ¬βββ ββββ¬βββ
β β β
ββββββββββΌβββββββββ
β
βΌ
ββββββββββββββββββββββββ
β Content Extraction β
β (Read & Parse) β
ββββββββββββ¬ββββββββββββ
β
βΌ
ββββββββββββββββββββββββ
β Summarization β
β (Synthesize Info) β
ββββββββββββ¬ββββββββββββ
β
βΌ
ββββββββββββββββββββββββ
β Report Generation β
β (Final Document) β
ββββββββββββββββββββββββ
π οΈ Setup & Dependencies
1 Install Required Packages
# Core dependencies
pip install langchain langchain-community langchain-openai
pip install openai python-dotenv
# Search & web tools
pip install google-search-results wikipedia duckduckgo-search
pip install beautifulsoup4 requests html2text
# Optional: For better PDF handling
pip install pypdf pdfplumber
2 Set Up API Keys
Create a .env file in your project root:
# OpenAI API key (required)
OPENAI_API_KEY=your_openai_key_here
# SerpAPI key for Google search (optional but recommended)
SERPAPI_API_KEY=your_serpapi_key_here
# Note: Wikipedia and DuckDuckGo don't require API keys
π‘ API Keys:
- OpenAI: Get from platform.openai.com
- SerpAPI: Free 100 searches/month at serpapi.com
3 Create Project Structure
research-assistant/
βββ .env # API keys
βββ research_agent.py # Main agent code
βββ tools/
β βββ search_tools.py # Search implementations
β βββ web_tools.py # Web scraping
β βββ report_tools.py # Report generation
βββ outputs/ # Generated reports
βββ requirements.txt # Dependencies
π» Building the Research Assistant
Step 1: Create Search Tools
"""
Search tools for the research assistant
"""
from langchain.tools import Tool
from langchain_community.utilities import GoogleSearchAPIWrapper, WikipediaAPIWrapper
from langchain_community.utilities import DuckDuckGoSearchAPIWrapper
import os
from typing import List, Dict
class SearchTools:
"""Collection of search tools"""
def __init__(self):
# Initialize search APIs
self.serpapi_key = os.getenv("SERPAPI_API_KEY")
# Google Search (if API key available)
if self.serpapi_key:
self.google_search = GoogleSearchAPIWrapper()
# Wikipedia (always available)
self.wikipedia = WikipediaAPIWrapper()
# DuckDuckGo (always available, no API key needed)
self.ddg_search = DuckDuckGoSearchAPIWrapper()
def search_google(self, query: str) -> str:
"""Search Google and return results"""
try:
if not self.serpapi_key:
return "Google search unavailable (no API key)"
results = self.google_search.run(query)
return f"Google Results:\\n{results}"
except Exception as e:
return f"Google search failed: {str(e)}"
def search_wikipedia(self, query: str) -> str:
"""Search Wikipedia and return article summary"""
try:
results = self.wikipedia.run(query)
# Truncate to first 1000 characters
return f"Wikipedia Summary:\\n{results[:1000]}..."
except Exception as e:
return f"Wikipedia search failed: {str(e)}"
def search_duckduckgo(self, query: str) -> str:
"""Search DuckDuckGo and return results"""
try:
results = self.ddg_search.run(query)
return f"DuckDuckGo Results:\\n{results}"
except Exception as e:
return f"DuckDuckGo search failed: {str(e)}"
def get_tools(self) -> List[Tool]:
"""Get all search tools as LangChain Tools"""
tools = [
Tool(
name="Google Search",
func=self.search_google,
description="Search Google for current information. Use for recent events, news, and general web content."
),
Tool(
name="Wikipedia Search",
func=self.search_wikipedia,
description="Search Wikipedia for factual, encyclopedic information. Best for historical facts, definitions, and well-established knowledge."
),
Tool(
name="DuckDuckGo Search",
func=self.search_duckduckgo,
description="Search DuckDuckGo for web results. Good fallback when other searches fail. Privacy-focused."
)
]
return tools
# Test the tools
if __name__ == "__main__":
search_tools = SearchTools()
# Test Wikipedia
result = search_tools.search_wikipedia("Artificial Intelligence")
print(result)
# Test DuckDuckGo
result = search_tools.search_duckduckgo("latest AI developments")
print(result)
Step 2: Create Web Scraping Tools
"""
Web scraping and content extraction tools
"""
import requests
from bs4 import BeautifulSoup
import html2text
from typing import Optional
from langchain.tools import Tool
class WebTools:
"""Tools for fetching and parsing web content"""
def __init__(self):
self.html2text = html2text.HTML2Text()
self.html2text.ignore_links = False
self.html2text.ignore_images = True
def fetch_webpage(self, url: str) -> str:
"""Fetch and extract text content from a webpage"""
try:
# Fetch page
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36'
}
response = requests.get(url, headers=headers, timeout=10)
response.raise_for_status()
# Parse HTML
soup = BeautifulSoup(response.content, 'html.parser')
# Remove script and style elements
for script in soup(["script", "style", "nav", "footer", "header"]):
script.decompose()
# Get text
text = self.html2text.handle(str(soup))
# Clean and truncate
text = text.strip()
if len(text) > 3000:
text = text[:3000] + "\\n\\n[Content truncated...]"
return f"Content from {url}:\\n{text}"
except requests.exceptions.Timeout:
return f"Error: Timeout fetching {url}"
except requests.exceptions.RequestException as e:
return f"Error fetching {url}: {str(e)}"
except Exception as e:
return f"Error parsing {url}: {str(e)}"
def extract_links(self, url: str) -> str:
"""Extract all links from a webpage"""
try:
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36'
}
response = requests.get(url, headers=headers, timeout=10)
response.raise_for_status()
soup = BeautifulSoup(response.content, 'html.parser')
# Extract all links
links = []
for link in soup.find_all('a', href=True):
href = link['href']
if href.startswith('http'):
links.append(href)
# Return first 10 links
links = links[:10]
return f"Links found on {url}:\\n" + "\\n".join(links)
except Exception as e:
return f"Error extracting links from {url}: {str(e)}"
def get_tools(self) -> list:
"""Get web tools as LangChain Tools"""
return [
Tool(
name="Fetch Webpage",
func=self.fetch_webpage,
description="Fetch and read the full text content of a webpage given its URL. Returns cleaned text content."
),
Tool(
name="Extract Links",
func=self.extract_links,
description="Extract all hyperlinks from a webpage. Useful for finding related resources."
)
]
# Test
if __name__ == "__main__":
web_tools = WebTools()
content = web_tools.fetch_webpage("https://en.wikipedia.org/wiki/Artificial_intelligence")
print(content[:500])
Step 3: Build the Research Agent
"""
Autonomous Research Assistant Agent
"""
import os
from dotenv import load_dotenv
from langchain.agents import AgentExecutor, create_openai_tools_agent
from langchain_openai import ChatOpenAI
from langchain.prompts import ChatPromptTemplate, MessagesPlaceholder
from tools.search_tools import SearchTools
from tools.web_tools import WebTools
from datetime import datetime
import json
# Load environment variables
load_dotenv()
class ResearchAgent:
"""Autonomous research assistant that can search and synthesize information"""
def __init__(self, model: str = "gpt-4-turbo-preview", verbose: bool = True):
self.model = model
self.verbose = verbose
# Initialize LLM
self.llm = ChatOpenAI(
model=model,
temperature=0.7,
openai_api_key=os.getenv("OPENAI_API_KEY")
)
# Initialize tools
self.search_tools = SearchTools()
self.web_tools = WebTools()
# Combine all tools
self.tools = self.search_tools.get_tools() + self.web_tools.get_tools()
# Create agent
self.agent_executor = self._create_agent()
def _create_agent(self) -> AgentExecutor:
"""Create the research agent with tools"""
# System prompt
system_prompt = """You are an expert research assistant. Your goal is to help users research topics thoroughly by:
1. **Planning**: Break down the research question into sub-questions
2. **Searching**: Use multiple search tools (Google, Wikipedia, DuckDuckGo) to gather information
3. **Reading**: Fetch and read relevant webpages when needed
4. **Synthesizing**: Combine information from multiple sources
5. **Citing**: Always cite your sources with URLs
**Research Process:**
- Start by searching Wikipedia for foundational knowledge
- Use Google or DuckDuckGo for recent information and diverse perspectives
- Fetch full webpage content when you need detailed information
- Cross-reference facts across multiple sources
- Organize findings clearly with headings and bullet points
**Output Format:**
Your final report should include:
- Executive Summary (2-3 sentences)
- Key Findings (bullet points)
- Detailed Analysis (organized sections)
- Sources Cited (URLs with descriptions)
Be thorough but concise. Prioritize accuracy over speed.
"""
# Create prompt template
prompt = ChatPromptTemplate.from_messages([
("system", system_prompt),
("human", "{input}"),
MessagesPlaceholder(variable_name="agent_scratchpad")
])
# Create agent
agent = create_openai_tools_agent(
llm=self.llm,
tools=self.tools,
prompt=prompt
)
# Create executor
return AgentExecutor(
agent=agent,
tools=self.tools,
verbose=self.verbose,
max_iterations=10,
handle_parsing_errors=True
)
def research(self, query: str) -> dict:
"""Conduct research on a topic"""
print(f"\\nπ Starting research on: {query}\\n")
print("=" * 60)
try:
# Run agent
result = self.agent_executor.invoke({
"input": f"""Research this topic thoroughly: {query}
Please provide a comprehensive report with:
1. Executive Summary
2. Key Findings
3. Detailed Analysis
4. Sources Cited"""
})
# Extract output
report = result["output"]
print("\\n" + "=" * 60)
print("β
Research Complete!")
print("=" * 60)
return {
"success": True,
"query": query,
"report": report,
"timestamp": datetime.now().isoformat()
}
except Exception as e:
print(f"\\nβ Research failed: {str(e)}")
return {
"success": False,
"query": query,
"error": str(e),
"timestamp": datetime.now().isoformat()
}
def save_report(self, result: dict, filename: str = None):
"""Save research report to file"""
if not filename:
# Generate filename from query and timestamp
query_slug = result["query"][:30].replace(" ", "_")
timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
filename = f"outputs/research_{query_slug}_{timestamp}.md"
# Create outputs directory if it doesn't exist
os.makedirs("outputs", exist_ok=True)
# Format report
content = f"""# Research Report: {result['query']}
**Generated:** {result['timestamp']}
**Status:** {'β
Success' if result['success'] else 'β Failed'}
---
{result.get('report', result.get('error', 'No content'))}
---
*Generated by AI Research Assistant*
"""
# Save to file
with open(filename, 'w', encoding='utf-8') as f:
f.write(content)
print(f"\\nπ Report saved to: {filename}")
# Main execution
if __name__ == "__main__":
# Create agent
agent = ResearchAgent(model="gpt-4-turbo-preview", verbose=True)
# Example research queries
queries = [
"What is quantum computing and how does it differ from classical computing?",
"Explain the latest developments in large language models as of 2024",
"What are the environmental impacts of AI and data centers?"
]
# Research first query
result = agent.research(queries[0])
# Save report
agent.save_report(result)
# Print report
if result["success"]:
print("\\n" + "=" * 60)
print("RESEARCH REPORT")
print("=" * 60)
print(result["report"])
β Checkpoint: Test Your Agent
Run the agent with a simple query:
python research_agent.py
You should see the agent:
- Planning its research approach
- Searching multiple sources
- Reading relevant content
- Generating a comprehensive report
π Advanced Features
Add Memory for Context
from langchain.memory import ConversationBufferMemory
class ResearchAgentWithMemory(ResearchAgent):
"""Research agent with conversation memory"""
def __init__(self, model: str = "gpt-4-turbo-preview", verbose: bool = True):
super().__init__(model, verbose)
# Add memory
self.memory = ConversationBufferMemory(
memory_key="chat_history",
return_messages=True
)
def research_followup(self, query: str) -> dict:
"""Research with access to previous conversation"""
# Get conversation history
history = self.memory.load_memory_variables({})
# Add to prompt
enhanced_query = f"""Previous context: {history}
New research question: {query}"""
result = self.research(enhanced_query)
# Save to memory
self.memory.save_context(
{"input": query},
{"output": result.get("report", "")}
)
return result
# Usage
agent = ResearchAgentWithMemory()
result1 = agent.research("What is quantum computing?")
result2 = agent.research_followup("How is it used in cryptography?") # Has context!
Add Progress Tracking
from langchain.callbacks import StdOutCallbackHandler
from typing import Any
class ProgressCallback(StdOutCallbackHandler):
"""Custom callback to track progress"""
def __init__(self):
super().__init__()
self.steps = []
self.current_step = 0
def on_tool_start(self, serialized: dict, input_str: str, **kwargs):
"""Called when tool starts"""
self.current_step += 1
tool_name = serialized.get("name", "Unknown")
print(f"\\nπ§ Step {self.current_step}: Using {tool_name}")
print(f" Input: {input_str[:100]}...")
self.steps.append({
"step": self.current_step,
"tool": tool_name,
"input": input_str
})
def on_tool_end(self, output: str, **kwargs):
"""Called when tool completes"""
print(f" β Complete: {output[:100]}...")
# Use with agent
agent = ResearchAgent(verbose=True)
callback = ProgressCallback()
result = agent.agent_executor.invoke(
{"input": "Research quantum computing"},
config={"callbacks": [callback]}
)
# View progress
print(f"\\nπ Research completed in {len(callback.steps)} steps")
Add Cost Tracking
from langchain.callbacks import get_openai_callback
def research_with_cost_tracking(agent: ResearchAgent, query: str):
"""Research and track costs"""
with get_openai_callback() as cb:
result = agent.research(query)
# Print cost summary
print(f"\\nπ° Cost Summary:")
print(f" Tokens used: {cb.total_tokens}")
print(f" Prompt tokens: {cb.prompt_tokens}")
print(f" Completion tokens: {cb.completion_tokens}")
print(f" Total cost: ${cb.total_cost:.4f}")
# Add to result
result["cost"] = {
"tokens": cb.total_tokens,
"cost_usd": cb.total_cost
}
return result
# Usage
agent = ResearchAgent()
result = research_with_cost_tracking(agent, "Explain blockchain technology")
πͺ Challenges & Extensions
π₯ Challenge 1: Multi-Topic Research
Modify the agent to research multiple related topics and compare findings.
def compare_research(topics: list) -> dict:
"""Research multiple topics and compare"""
agent = ResearchAgent()
results = {}
for topic in topics:
results[topic] = agent.research(topic)
# Generate comparison report
comparison_prompt = f"""Compare these research findings:
{json.dumps(results, indent=2)}
Provide a comparative analysis highlighting:
1. Common themes
2. Key differences
3. Unique insights from each
"""
# TODO: Use agent to generate comparison
pass
# Test
compare_research([
"Quantum computing",
"Classical computing",
"Neuromorphic computing"
])
π₯ Challenge 2: Add PDF Research
Enable the agent to read and analyze PDF documents.
import pypdf
def read_pdf_tool(file_path: str) -> str:
"""Read and extract text from PDF"""
try:
reader = pypdf.PdfReader(file_path)
text = ""
for page in reader.pages:
text += page.extract_text()
return text[:3000] # Truncate
except Exception as e:
return f"Error reading PDF: {e}"
# Add to agent tools
Tool(
name="Read PDF",
func=read_pdf_tool,
description="Read and extract text from a PDF file"
)
π₯ Challenge 3: Build a Web Interface
Create a Streamlit web interface for the research assistant.
import streamlit as st
st.title("π AI Research Assistant")
query = st.text_input("What would you like to research?")
if st.button("Start Research"):
with st.spinner("Researching..."):
agent = ResearchAgent(verbose=False)
result = agent.research(query)
if result["success"]:
st.success("Research complete!")
st.markdown(result["report"])
else:
st.error(f"Research failed: {result['error']}")
# Run with: streamlit run app.py
π Production Deployment
1. Add Error Handling & Retries
from tenacity import retry, stop_after_attempt, wait_exponential
@retry(stop=stop_after_attempt(3), wait=wait_exponential(min=1, max=10))
def search_with_retry(search_func, query):
"""Search with automatic retries"""
return search_func(query)
2. Add Rate Limiting
from ratelimit import limits, sleep_and_retry
@sleep_and_retry
@limits(calls=10, period=60) # 10 calls per minute
def rate_limited_search(query):
"""Rate-limited search"""
return search(query)
3. Cache Results
import functools
import hashlib
@functools.lru_cache(maxsize=100)
def cached_research(query_hash: str):
"""Cache research results"""
# Implementation
pass
# Usage
query_hash = hashlib.md5(query.encode()).hexdigest()
result = cached_research(query_hash)
β οΈ Production Considerations:
- API Limits: Monitor OpenAI and search API usage
- Cost Control: Set max_tokens limits to control costs
- Error Handling: Implement comprehensive try-catch blocks
- Logging: Log all searches and results for debugging
- Security: Validate all URLs before fetching content
π― Key Takeaways
- Agent Architecture: Tools + LLM + Planning = Autonomous behavior
- Tool Design: Each tool should have clear purpose and error handling
- Search Strategy: Use multiple sources for comprehensive coverage
- Error Handling: Graceful degradation when tools fail
- Cost Management: Track tokens and implement caching
- Iterative Improvement: Start simple, add features incrementally
π Next Steps
- π Project 2: Multi-Agent Code Review System - Learn agent collaboration
- π Project 3: Business Process Automation - Build practical automation
- π Agent Evaluation & Safety - Learn to evaluate and secure agents
- π Production Agent Systems - Deploy at scale
π‘ Share Your Project: Built something cool? Share it on Twitter/LinkedIn with #AIAgents and tag @AITutorialsSite!
π Congratulations!
You've built an autonomous AI research assistant that can search, read, and synthesize information!
β Back to AI Agents Course