Use AI and NLP to automatically generate test cases, scenarios, and scripts from requirements and user stories
Writing test cases is tedious. You read requirements, imagine scenarios, consider edge cases, then manually type them into test management tools. A simple user story might need 10-20 test cases covering happy paths, error handling, and boundary conditions. What if AI could generate all of them in seconds?
In this tutorial, you'll build an AI-powered test case generator that reads requirements and automatically creates comprehensive test scenarios. You'll use OpenAI's GPT models and NLP to transform user stories into executable test scripts, complete with edge cases you might have missed.
Traditional test case creation is:
π‘ AI Solution: Large Language Models (LLMs) like GPT-4 can understand natural language requirements and generate structured test cases in seconds. Studies show AI-generated tests achieve 85-95% coverage compared to manual testing.
Natural Language Processing enables AI to:
AI-generated test cases should include:
First, install the OpenAI library and set up your API key:
# Install OpenAI Python library
pip install openai python-dotenv
# Create .env file for API key
echo "OPENAI_API_KEY=your-api-key-here" > .env
import openai
import os
from dotenv import load_dotenv
import json
# Load API key
load_dotenv()
openai.api_key = os.getenv("OPENAI_API_KEY")
# Test the connection
def test_openai_connection():
"""Verify OpenAI API is working"""
try:
response = openai.ChatCompletion.create(
model="gpt-4",
messages=[
{"role": "user", "content": "Say 'API connection successful!'"}
],
max_tokens=50
)
print(response.choices[0].message.content)
return True
except Exception as e:
print(f"β Error: {e}")
return False
# Run test
test_openai_connection()
β οΈ API Costs: OpenAI charges per token. GPT-4 costs ~$0.03 per 1K input tokens and ~$0.06 per 1K output tokens. For test generation, expect $0.10-0.50 per user story. Use GPT-3.5-turbo for cheaper alternative ($0.001 per 1K tokens).
Let's create an AI-powered test case generator:
import openai
import json
from typing import List, Dict
class AITestGenerator:
"""
Generate comprehensive test cases from user stories using GPT
"""
def __init__(self, model="gpt-4", temperature=0.7):
"""
Args:
model: OpenAI model to use (gpt-4 or gpt-3.5-turbo)
temperature: Creativity (0=deterministic, 1=creative)
"""
self.model = model
self.temperature = temperature
def generate_test_cases(self, user_story: str,
num_cases: int = 10,
include_negative: bool = True) -> List[Dict]:
"""
Generate test cases from a user story
Args:
user_story: Natural language requirement
num_cases: Number of test cases to generate
include_negative: Include negative/error scenarios
Returns:
List of test case dictionaries
"""
prompt = self._build_prompt(user_story, num_cases, include_negative)
print(f"π€ Generating {num_cases} test cases...")
print(f"π User Story: {user_story[:100]}...")
try:
response = openai.ChatCompletion.create(
model=self.model,
messages=[
{"role": "system", "content": self._get_system_prompt()},
{"role": "user", "content": prompt}
],
temperature=self.temperature,
max_tokens=2000
)
# Parse JSON response
content = response.choices[0].message.content
# Extract JSON from markdown code block if present
if "```json" in content:
content = content.split("```json")[1].split("```")[0].strip()
elif "```" in content:
content = content.split("```")[1].split("```")[0].strip()
test_cases = json.loads(content)
print(f"β
Generated {len(test_cases)} test cases")
return test_cases
except Exception as e:
print(f"β Error generating test cases: {e}")
return []
def _get_system_prompt(self) -> str:
"""Define the AI's role and expertise"""
return """You are an expert QA engineer with 10+ years of experience.
You excel at creating comprehensive, detailed test cases that cover:
- Happy path scenarios
- Negative test cases (invalid inputs, errors)
- Boundary conditions (min/max values, edge cases)
- Security considerations (SQL injection, XSS, authentication)
- Performance and usability
Generate test cases in JSON format with this structure:
{
"test_id": "TC_XXX_001",
"title": "Clear, descriptive title",
"priority": "Critical|High|Medium|Low",
"category": "Functional|Regression|Smoke|Security|Performance",
"preconditions": ["Setup step 1", "Setup step 2"],
"steps": [
{"step": 1, "action": "What to do", "expected": "What should happen"}
],
"test_data": {"field": "value"},
"postconditions": ["Cleanup actions"]
}
Be specific, actionable, and thorough."""
def _build_prompt(self, user_story: str, num_cases: int,
include_negative: bool) -> str:
"""Build the user prompt"""
prompt = f"""Generate {num_cases} detailed test cases for this user story:
USER STORY:
{user_story}
REQUIREMENTS:
- Include both positive (happy path) and negative (error) scenarios
- Cover boundary conditions and edge cases
- Be specific about test data to use
- Include security considerations where relevant
- Prioritize test cases appropriately
- Use clear, actionable language
Return ONLY a JSON array of test cases, no additional text."""
if not include_negative:
prompt += "\n- Focus only on positive scenarios"
return prompt
def generate_test_script(self, test_case: Dict, language: str = "python") -> str:
"""
Convert a test case into executable code
Args:
test_case: Test case dictionary
language: Target language (python, java, javascript)
Returns:
Executable test script code
"""
prompt = f"""Convert this test case into executable {language} test code using pytest/selenium:
TEST CASE:
{json.dumps(test_case, indent=2)}
REQUIREMENTS:
- Use pytest framework for Python
- Use Selenium WebDriver for browser automation
- Include proper assertions
- Add comments explaining each step
- Handle waits and error cases
- Use Page Object Model if applicable
Return ONLY the code, no explanations."""
try:
response = openai.ChatCompletion.create(
model=self.model,
messages=[
{"role": "system", "content": "You are an expert test automation engineer."},
{"role": "user", "content": prompt}
],
temperature=0.3, # Lower for code generation
max_tokens=1500
)
code = response.choices[0].message.content
# Extract code from markdown if present
if f"```{language}" in code:
code = code.split(f"```{language}")[1].split("```")[0].strip()
elif "```" in code:
code = code.split("```")[1].split("```")[0].strip()
return code
except Exception as e:
print(f"β Error generating test script: {e}")
return ""
# Usage example
generator = AITestGenerator(model="gpt-4")
# User story
user_story = """
As a registered user,
I want to log in to my account using my email and password,
So that I can access my personalized dashboard.
Acceptance Criteria:
- User can log in with valid email and password
- Invalid credentials show an error message
- Account gets locked after 5 failed attempts
- User can reset password via email
- Remember me option keeps user logged in for 30 days
"""
# Generate test cases
test_cases = generator.generate_test_cases(
user_story=user_story,
num_cases=12,
include_negative=True
)
# Print generated test cases
for i, tc in enumerate(test_cases, 1):
print(f"\n{'='*60}")
print(f"TEST CASE {i}: {tc.get('title', 'Untitled')}")
print('='*60)
print(f"ID: {tc.get('test_id', 'N/A')}")
print(f"Priority: {tc.get('priority', 'N/A')}")
print(f"Category: {tc.get('category', 'N/A')}")
print(f"\nPreconditions:")
for pre in tc.get('preconditions', []):
print(f" - {pre}")
print(f"\nSteps:")
for step in tc.get('steps', []):
print(f" {step['step']}. {step['action']}")
print(f" Expected: {step['expected']}")
print(f"\nTest Data: {tc.get('test_data', {})}")
β Result: GPT-4 generates 12 comprehensive test cases in ~10 seconds, covering happy paths, error handling, security (account lockout), and boundary conditions!
Now let's convert test cases into actual Python/Selenium code:
# Pick a test case to automate
login_test_case = test_cases[0] # Assuming first is successful login
# Generate executable code
print("\n" + "="*60)
print("GENERATING EXECUTABLE TEST SCRIPT")
print("="*60)
test_script = generator.generate_test_script(
test_case=login_test_case,
language="python"
)
print("\n" + test_script)
# Save to file
with open("test_login.py", "w") as f:
f.write(test_script)
print("\nβ
Test script saved to test_login.py")
print("Run with: pytest test_login.py")
# Example of what GPT-4 might generate:
import pytest
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.chrome.service import Service
from webdriver_manager.chrome import ChromeDriverManager
class TestLogin:
"""
Test case: TC_LOGIN_001
Verify successful login with valid credentials
"""
@pytest.fixture
def driver(self):
"""Setup WebDriver"""
driver = webdriver.Chrome(service=Service(ChromeDriverManager().install()))
driver.implicitly_wait(10)
yield driver
driver.quit()
def test_successful_login_valid_credentials(self, driver):
"""
Test steps:
1. Navigate to login page
2. Enter valid email
3. Enter valid password
4. Click login button
5. Verify redirect to dashboard
"""
# Step 1: Navigate to login page
driver.get("https://example.com/login")
# Step 2: Enter valid email
email_field = WebDriverWait(driver, 10).until(
EC.presence_of_element_located((By.ID, "email"))
)
email_field.send_keys("testuser@example.com")
# Step 3: Enter valid password
password_field = driver.find_element(By.ID, "password")
password_field.send_keys("SecurePass123!")
# Step 4: Click login button
login_button = driver.find_element(By.ID, "login-btn")
login_button.click()
# Step 5: Verify redirect to dashboard
WebDriverWait(driver, 10).until(
EC.url_contains("/dashboard")
)
# Additional assertions
assert "dashboard" in driver.current_url.lower(), "Should redirect to dashboard"
welcome_message = driver.find_element(By.CLASS_NAME, "welcome-message")
assert welcome_message.is_displayed(), "Welcome message should be visible"
print("β
Test passed: User logged in successfully")
AI can find edge cases humans might miss:
class EdgeCaseDiscoverer:
"""
Use AI to discover edge cases and boundary conditions
"""
def discover_edge_cases(self, feature_description: str) -> List[str]:
"""
Generate comprehensive list of edge cases
"""
prompt = f"""You are a security and edge case expert. Analyze this feature:
FEATURE:
{feature_description}
Generate a comprehensive list of edge cases, boundary conditions, and unusual scenarios to test:
- Minimum/maximum values
- Empty inputs, null values
- Special characters, Unicode, emojis
- SQL injection, XSS attacks
- Race conditions, concurrency
- Network failures, timeouts
- Invalid data types
- Extremely large inputs
- Unexpected user behavior
Return as a JSON array of edge case descriptions."""
try:
response = openai.ChatCompletion.create(
model="gpt-4",
messages=[
{"role": "system", "content": "You are an expert at finding edge cases and security vulnerabilities."},
{"role": "user", "content": prompt}
],
temperature=0.8, # Higher for creativity
max_tokens=1000
)
content = response.choices[0].message.content
# Extract JSON
if "```json" in content:
content = content.split("```json")[1].split("```")[0].strip()
elif "```" in content:
content = content.split("```")[1].split("```")[0].strip()
edge_cases = json.loads(content)
return edge_cases
except Exception as e:
print(f"β Error discovering edge cases: {e}")
return []
# Usage
discoverer = EdgeCaseDiscoverer()
feature = """
Login form that accepts email and password.
Email must be valid format, password must be 8+ characters.
Form has client-side and server-side validation.
"""
edge_cases = discoverer.discover_edge_cases(feature)
print("\n" + "="*60)
print("DISCOVERED EDGE CASES")
print("="*60)
for i, case in enumerate(edge_cases, 1):
print(f"{i}. {case}")
π‘ AI Advantage: GPT-4 can suggest edge cases like "What if email contains Unicode characters like Γ± or emojis?" or "What happens if user submits form 100 times in 1 second?" that manual testers might overlook.
AI can also generate realistic test data:
class TestDataGenerator:
"""
Generate realistic test data using AI
"""
def generate_test_data(self, data_type: str, count: int = 10,
constraints: str = "") -> List:
"""
Generate test data matching specified type and constraints
Args:
data_type: Type of data (email, phone, address, etc.)
count: Number of samples to generate
constraints: Additional requirements
"""
prompt = f"""Generate {count} realistic test data samples for: {data_type}
CONSTRAINTS:
{constraints if constraints else "None - use realistic, diverse data"}
REQUIREMENTS:
- Data should be realistic and varied
- Include edge cases (min/max lengths, special characters)
- Include both valid and invalid samples if appropriate
- Return as JSON array
Examples:
- For emails: valid formats, invalid formats, edge cases
- For names: different cultures, special characters, very long names
- For addresses: various countries, apartment numbers, PO boxes"""
try:
response = openai.ChatCompletion.create(
model="gpt-3.5-turbo", # Cheaper for data generation
messages=[
{"role": "user", "content": prompt}
],
temperature=0.9, # High creativity for varied data
max_tokens=800
)
content = response.choices[0].message.content
# Extract JSON
if "```json" in content:
content = content.split("```json")[1].split("```")[0].strip()
elif "```" in content:
content = content.split("```")[1].split("```")[0].strip()
test_data = json.loads(content)
return test_data
except Exception as e:
print(f"β Error generating test data: {e}")
return []
# Usage
data_gen = TestDataGenerator()
# Generate email test data
emails = data_gen.generate_test_data(
data_type="email addresses",
count=15,
constraints="Include valid emails, invalid formats, SQL injection attempts, XSS payloads"
)
print("\n" + "="*60)
print("GENERATED TEST DATA: Emails")
print("="*60)
for i, email in enumerate(emails, 1):
print(f"{i}. {email}")
# Generate phone numbers
phones = data_gen.generate_test_data(
data_type="phone numbers",
count=10,
constraints="Include US, international, invalid formats, edge cases"
)
print("\n" + "="*60)
print("GENERATED TEST DATA: Phone Numbers")
print("="*60)
for i, phone in enumerate(phones, 1):
print(f"{i}. {phone}")
Let's put it all together in an automated pipeline:
class EndToEndTestGenerator:
"""
Complete pipeline: Requirement β Test Cases β Test Scripts β Test Data
"""
def __init__(self):
self.test_gen = AITestGenerator(model="gpt-4")
self.edge_discoverer = EdgeCaseDiscoverer()
self.data_gen = TestDataGenerator()
def generate_complete_test_suite(self, user_story: str,
output_dir: str = "generated_tests"):
"""
Generate entire test suite from user story
"""
import os
os.makedirs(output_dir, exist_ok=True)
print("π Starting end-to-end test generation pipeline...\n")
# Step 1: Generate test cases
print("π Step 1: Generating test cases...")
test_cases = self.test_gen.generate_test_cases(
user_story=user_story,
num_cases=15,
include_negative=True
)
# Save test cases as JSON
with open(f"{output_dir}/test_cases.json", "w") as f:
json.dump(test_cases, f, indent=2)
print(f"β
Saved {len(test_cases)} test cases to test_cases.json\n")
# Step 2: Discover edge cases
print("π Step 2: Discovering edge cases...")
edge_cases = self.edge_discoverer.discover_edge_cases(user_story)
with open(f"{output_dir}/edge_cases.json", "w") as f:
json.dump(edge_cases, f, indent=2)
print(f"β
Discovered {len(edge_cases)} edge cases\n")
# Step 3: Generate test scripts
print("π» Step 3: Generating executable test scripts...")
for i, test_case in enumerate(test_cases[:5], 1): # Generate scripts for first 5
print(f" Generating script {i}/5...")
script = self.test_gen.generate_test_script(
test_case=test_case,
language="python"
)
# Save script
test_id = test_case.get('test_id', f'test_{i}').lower().replace('_', '_')
filename = f"{output_dir}/{test_id}.py"
with open(filename, "w") as f:
f.write(script)
print(f"β
Generated 5 executable test scripts\n")
# Step 4: Generate test data
print("π Step 4: Generating test data...")
# Extract data types from test cases
data_types = self._extract_data_types(test_cases)
all_test_data = {}
for data_type in data_types:
data = self.data_gen.generate_test_data(
data_type=data_type,
count=20
)
all_test_data[data_type] = data
with open(f"{output_dir}/test_data.json", "w") as f:
json.dump(all_test_data, f, indent=2)
print(f"β
Generated test data for {len(data_types)} data types\n")
# Step 5: Generate summary report
self._generate_report(output_dir, test_cases, edge_cases, data_types)
print("="*60)
print("β
TEST SUITE GENERATION COMPLETE!")
print("="*60)
print(f"Output directory: {output_dir}/")
print(f" - test_cases.json ({len(test_cases)} cases)")
print(f" - edge_cases.json ({len(edge_cases)} cases)")
print(f" - 5 executable Python test scripts")
print(f" - test_data.json (test data sets)")
print(f" - test_suite_summary.txt (report)")
def _extract_data_types(self, test_cases: List[Dict]) -> List[str]:
"""Extract data types from test cases"""
data_types = set()
for tc in test_cases:
test_data = tc.get('test_data', {})
for key in test_data.keys():
if 'email' in key.lower():
data_types.add('email')
elif 'password' in key.lower():
data_types.add('password')
elif 'phone' in key.lower():
data_types.add('phone')
elif 'name' in key.lower():
data_types.add('name')
return list(data_types) or ['email', 'password'] # Default
def _generate_report(self, output_dir: str, test_cases: List[Dict],
edge_cases: List, data_types: List[str]):
"""Generate summary report"""
report = f"""
TEST SUITE GENERATION SUMMARY
{'='*60}
Generated: {len(test_cases)} test cases
Edge Cases: {len(edge_cases)} scenarios
Test Scripts: 5 executable Python files
Test Data: {len(data_types)} data type sets
TEST CASE BREAKDOWN:
"""
# Count by priority
priorities = {}
categories = {}
for tc in test_cases:
priority = tc.get('priority', 'Unknown')
category = tc.get('category', 'Unknown')
priorities[priority] = priorities.get(priority, 0) + 1
categories[category] = categories.get(category, 0) + 1
report += "\nBy Priority:\n"
for priority, count in sorted(priorities.items()):
report += f" {priority}: {count}\n"
report += "\nBy Category:\n"
for category, count in sorted(categories.items()):
report += f" {category}: {count}\n"
report += f"\n{'='*60}\n"
with open(f"{output_dir}/test_suite_summary.txt", "w") as f:
f.write(report)
print(report)
# Complete workflow example
pipeline = EndToEndTestGenerator()
user_story = """
As a user,
I want to register for an account,
So that I can access premium features.
Acceptance Criteria:
- User provides email, password, and full name
- Email must be unique and valid format
- Password must be 8+ characters with 1 uppercase, 1 number
- User receives confirmation email
- User can log in immediately after registration
- Failed registrations show appropriate error messages
"""
# Generate everything!
pipeline.generate_complete_test_suite(
user_story=user_story,
output_dir="registration_tests"
)
β Complete Automation: From a single user story, you now have 15 test cases, edge case scenarios, 5 executable test scripts, and test data setsβall generated in under 60 seconds!
β οΈ Limitations: AI can hallucinate element locators or APIs that don't exist. Always validate generated code runs successfully against your actual application.
# .github/workflows/ai-test-generation.yml
name: AI Test Generation
on:
pull_request:
paths:
- 'requirements/**' # Trigger on requirement changes
jobs:
generate-tests:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v2
- name: Set up Python
uses: actions/setup-python@v2
with:
python-version: '3.9'
- name: Install dependencies
run: |
pip install openai python-dotenv
- name: Generate test cases
env:
OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
run: |
python generate_tests.py --input requirements/new_feature.txt
- name: Create PR with generated tests
uses: peter-evans/create-pull-request@v4
with:
commit-message: 'chore: AI-generated test cases for new feature'
title: '[AI] Generated Test Cases'
body: |
π€ This PR contains AI-generated test cases.
Please review for accuracy and completeness.
branch: ai-generated-tests
Challenge: Build an AI test generator that:
Bonus: Add support for API test generation using requests/pytest!
In the next tutorial, AI-Powered Test Data Generation, you'll dive deeper into creating sophisticated test data using GANs, synthetic data generation, and privacy-safe techniques. You'll explore:
β Tutorial Complete! You now have the power to generate comprehensive test suites automatically using AIβsay goodbye to tedious manual test case writing!
Check your understanding of intelligent test case generation
1. What percentage of QA time is typically spent just writing test cases manually?
2. Which OpenAI model provides the best quality for test case generation?
3. What is a key advantage of AI-generated edge case discovery?
4. What should you always do with AI-generated test code before using it?