Intermediate

Intelligent Test Case Generation

Use AI and NLP to automatically generate test cases, scenarios, and scripts from requirements and user stories

Writing test cases is tedious. You read requirements, imagine scenarios, consider edge cases, then manually type them into test management tools. A simple user story might need 10-20 test cases covering happy paths, error handling, and boundary conditions. What if AI could generate all of them in seconds?

In this tutorial, you'll build an AI-powered test case generator that reads requirements and automatically creates comprehensive test scenarios. You'll use OpenAI's GPT models and NLP to transform user stories into executable test scripts, complete with edge cases you might have missed.

The Manual Testing Problem

Traditional test case creation is:

πŸ’‘ AI Solution: Large Language Models (LLMs) like GPT-4 can understand natural language requirements and generate structured test cases in seconds. Studies show AI-generated tests achieve 85-95% coverage compared to manual testing.

Understanding NLP for Test Generation

Natural Language Processing enables AI to:

Anatomy of a Good Test Case

AI-generated test cases should include:

  1. Test ID: Unique identifier (TC_LOGIN_001)
  2. Title: Descriptive name ("Verify successful login with valid credentials")
  3. Preconditions: Setup requirements ("User is registered")
  4. Test Steps: Detailed actions to perform
  5. Test Data: Specific inputs to use
  6. Expected Results: What should happen
  7. Priority: Critical, High, Medium, Low
  8. Category: Functional, Regression, Smoke, etc.

Setting Up OpenAI GPT Integration

First, install the OpenAI library and set up your API key:

# Install OpenAI Python library
pip install openai python-dotenv

# Create .env file for API key
echo "OPENAI_API_KEY=your-api-key-here" > .env
import openai
import os
from dotenv import load_dotenv
import json

# Load API key
load_dotenv()
openai.api_key = os.getenv("OPENAI_API_KEY")

# Test the connection
def test_openai_connection():
    """Verify OpenAI API is working"""
    try:
        response = openai.ChatCompletion.create(
            model="gpt-4",
            messages=[
                {"role": "user", "content": "Say 'API connection successful!'"}
            ],
            max_tokens=50
        )
        print(response.choices[0].message.content)
        return True
    except Exception as e:
        print(f"❌ Error: {e}")
        return False

# Run test
test_openai_connection()

⚠️ API Costs: OpenAI charges per token. GPT-4 costs ~$0.03 per 1K input tokens and ~$0.06 per 1K output tokens. For test generation, expect $0.10-0.50 per user story. Use GPT-3.5-turbo for cheaper alternative ($0.001 per 1K tokens).

Building a Test Case Generator

Let's create an AI-powered test case generator:

import openai
import json
from typing import List, Dict

class AITestGenerator:
    """
    Generate comprehensive test cases from user stories using GPT
    """
    
    def __init__(self, model="gpt-4", temperature=0.7):
        """
        Args:
            model: OpenAI model to use (gpt-4 or gpt-3.5-turbo)
            temperature: Creativity (0=deterministic, 1=creative)
        """
        self.model = model
        self.temperature = temperature
    
    def generate_test_cases(self, user_story: str, 
                           num_cases: int = 10,
                           include_negative: bool = True) -> List[Dict]:
        """
        Generate test cases from a user story
        
        Args:
            user_story: Natural language requirement
            num_cases: Number of test cases to generate
            include_negative: Include negative/error scenarios
        
        Returns:
            List of test case dictionaries
        """
        prompt = self._build_prompt(user_story, num_cases, include_negative)
        
        print(f"πŸ€– Generating {num_cases} test cases...")
        print(f"πŸ“ User Story: {user_story[:100]}...")
        
        try:
            response = openai.ChatCompletion.create(
                model=self.model,
                messages=[
                    {"role": "system", "content": self._get_system_prompt()},
                    {"role": "user", "content": prompt}
                ],
                temperature=self.temperature,
                max_tokens=2000
            )
            
            # Parse JSON response
            content = response.choices[0].message.content
            
            # Extract JSON from markdown code block if present
            if "```json" in content:
                content = content.split("```json")[1].split("```")[0].strip()
            elif "```" in content:
                content = content.split("```")[1].split("```")[0].strip()
            
            test_cases = json.loads(content)
            
            print(f"βœ… Generated {len(test_cases)} test cases")
            return test_cases
            
        except Exception as e:
            print(f"❌ Error generating test cases: {e}")
            return []
    
    def _get_system_prompt(self) -> str:
        """Define the AI's role and expertise"""
        return """You are an expert QA engineer with 10+ years of experience. 
You excel at creating comprehensive, detailed test cases that cover:
- Happy path scenarios
- Negative test cases (invalid inputs, errors)
- Boundary conditions (min/max values, edge cases)
- Security considerations (SQL injection, XSS, authentication)
- Performance and usability

Generate test cases in JSON format with this structure:
{
  "test_id": "TC_XXX_001",
  "title": "Clear, descriptive title",
  "priority": "Critical|High|Medium|Low",
  "category": "Functional|Regression|Smoke|Security|Performance",
  "preconditions": ["Setup step 1", "Setup step 2"],
  "steps": [
    {"step": 1, "action": "What to do", "expected": "What should happen"}
  ],
  "test_data": {"field": "value"},
  "postconditions": ["Cleanup actions"]
}

Be specific, actionable, and thorough."""
    
    def _build_prompt(self, user_story: str, num_cases: int, 
                     include_negative: bool) -> str:
        """Build the user prompt"""
        prompt = f"""Generate {num_cases} detailed test cases for this user story:

USER STORY:
{user_story}

REQUIREMENTS:
- Include both positive (happy path) and negative (error) scenarios
- Cover boundary conditions and edge cases
- Be specific about test data to use
- Include security considerations where relevant
- Prioritize test cases appropriately
- Use clear, actionable language

Return ONLY a JSON array of test cases, no additional text."""

        if not include_negative:
            prompt += "\n- Focus only on positive scenarios"
        
        return prompt
    
    def generate_test_script(self, test_case: Dict, language: str = "python") -> str:
        """
        Convert a test case into executable code
        
        Args:
            test_case: Test case dictionary
            language: Target language (python, java, javascript)
        
        Returns:
            Executable test script code
        """
        prompt = f"""Convert this test case into executable {language} test code using pytest/selenium:

TEST CASE:
{json.dumps(test_case, indent=2)}

REQUIREMENTS:
- Use pytest framework for Python
- Use Selenium WebDriver for browser automation
- Include proper assertions
- Add comments explaining each step
- Handle waits and error cases
- Use Page Object Model if applicable

Return ONLY the code, no explanations."""

        try:
            response = openai.ChatCompletion.create(
                model=self.model,
                messages=[
                    {"role": "system", "content": "You are an expert test automation engineer."},
                    {"role": "user", "content": prompt}
                ],
                temperature=0.3,  # Lower for code generation
                max_tokens=1500
            )
            
            code = response.choices[0].message.content
            
            # Extract code from markdown if present
            if f"```{language}" in code:
                code = code.split(f"```{language}")[1].split("```")[0].strip()
            elif "```" in code:
                code = code.split("```")[1].split("```")[0].strip()
            
            return code
            
        except Exception as e:
            print(f"❌ Error generating test script: {e}")
            return ""

# Usage example
generator = AITestGenerator(model="gpt-4")

# User story
user_story = """
As a registered user,
I want to log in to my account using my email and password,
So that I can access my personalized dashboard.

Acceptance Criteria:
- User can log in with valid email and password
- Invalid credentials show an error message
- Account gets locked after 5 failed attempts
- User can reset password via email
- Remember me option keeps user logged in for 30 days
"""

# Generate test cases
test_cases = generator.generate_test_cases(
    user_story=user_story,
    num_cases=12,
    include_negative=True
)

# Print generated test cases
for i, tc in enumerate(test_cases, 1):
    print(f"\n{'='*60}")
    print(f"TEST CASE {i}: {tc.get('title', 'Untitled')}")
    print('='*60)
    print(f"ID: {tc.get('test_id', 'N/A')}")
    print(f"Priority: {tc.get('priority', 'N/A')}")
    print(f"Category: {tc.get('category', 'N/A')}")
    print(f"\nPreconditions:")
    for pre in tc.get('preconditions', []):
        print(f"  - {pre}")
    print(f"\nSteps:")
    for step in tc.get('steps', []):
        print(f"  {step['step']}. {step['action']}")
        print(f"     Expected: {step['expected']}")
    print(f"\nTest Data: {tc.get('test_data', {})}")

βœ… Result: GPT-4 generates 12 comprehensive test cases in ~10 seconds, covering happy paths, error handling, security (account lockout), and boundary conditions!

Generating Executable Test Scripts

Now let's convert test cases into actual Python/Selenium code:

# Pick a test case to automate
login_test_case = test_cases[0]  # Assuming first is successful login

# Generate executable code
print("\n" + "="*60)
print("GENERATING EXECUTABLE TEST SCRIPT")
print("="*60)

test_script = generator.generate_test_script(
    test_case=login_test_case,
    language="python"
)

print("\n" + test_script)

# Save to file
with open("test_login.py", "w") as f:
    f.write(test_script)

print("\nβœ… Test script saved to test_login.py")
print("Run with: pytest test_login.py")

Example Generated Test Script

# Example of what GPT-4 might generate:

import pytest
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.chrome.service import Service
from webdriver_manager.chrome import ChromeDriverManager

class TestLogin:
    """
    Test case: TC_LOGIN_001
    Verify successful login with valid credentials
    """
    
    @pytest.fixture
    def driver(self):
        """Setup WebDriver"""
        driver = webdriver.Chrome(service=Service(ChromeDriverManager().install()))
        driver.implicitly_wait(10)
        yield driver
        driver.quit()
    
    def test_successful_login_valid_credentials(self, driver):
        """
        Test steps:
        1. Navigate to login page
        2. Enter valid email
        3. Enter valid password
        4. Click login button
        5. Verify redirect to dashboard
        """
        # Step 1: Navigate to login page
        driver.get("https://example.com/login")
        
        # Step 2: Enter valid email
        email_field = WebDriverWait(driver, 10).until(
            EC.presence_of_element_located((By.ID, "email"))
        )
        email_field.send_keys("testuser@example.com")
        
        # Step 3: Enter valid password
        password_field = driver.find_element(By.ID, "password")
        password_field.send_keys("SecurePass123!")
        
        # Step 4: Click login button
        login_button = driver.find_element(By.ID, "login-btn")
        login_button.click()
        
        # Step 5: Verify redirect to dashboard
        WebDriverWait(driver, 10).until(
            EC.url_contains("/dashboard")
        )
        
        # Additional assertions
        assert "dashboard" in driver.current_url.lower(), "Should redirect to dashboard"
        
        welcome_message = driver.find_element(By.CLASS_NAME, "welcome-message")
        assert welcome_message.is_displayed(), "Welcome message should be visible"
        
        print("βœ… Test passed: User logged in successfully")

Advanced: Edge Case Discovery

AI can find edge cases humans might miss:

class EdgeCaseDiscoverer:
    """
    Use AI to discover edge cases and boundary conditions
    """
    
    def discover_edge_cases(self, feature_description: str) -> List[str]:
        """
        Generate comprehensive list of edge cases
        """
        prompt = f"""You are a security and edge case expert. Analyze this feature:

FEATURE:
{feature_description}

Generate a comprehensive list of edge cases, boundary conditions, and unusual scenarios to test:
- Minimum/maximum values
- Empty inputs, null values
- Special characters, Unicode, emojis
- SQL injection, XSS attacks
- Race conditions, concurrency
- Network failures, timeouts
- Invalid data types
- Extremely large inputs
- Unexpected user behavior

Return as a JSON array of edge case descriptions."""

        try:
            response = openai.ChatCompletion.create(
                model="gpt-4",
                messages=[
                    {"role": "system", "content": "You are an expert at finding edge cases and security vulnerabilities."},
                    {"role": "user", "content": prompt}
                ],
                temperature=0.8,  # Higher for creativity
                max_tokens=1000
            )
            
            content = response.choices[0].message.content
            
            # Extract JSON
            if "```json" in content:
                content = content.split("```json")[1].split("```")[0].strip()
            elif "```" in content:
                content = content.split("```")[1].split("```")[0].strip()
            
            edge_cases = json.loads(content)
            return edge_cases
            
        except Exception as e:
            print(f"❌ Error discovering edge cases: {e}")
            return []

# Usage
discoverer = EdgeCaseDiscoverer()

feature = """
Login form that accepts email and password.
Email must be valid format, password must be 8+ characters.
Form has client-side and server-side validation.
"""

edge_cases = discoverer.discover_edge_cases(feature)

print("\n" + "="*60)
print("DISCOVERED EDGE CASES")
print("="*60)

for i, case in enumerate(edge_cases, 1):
    print(f"{i}. {case}")

πŸ’‘ AI Advantage: GPT-4 can suggest edge cases like "What if email contains Unicode characters like Γ± or emojis?" or "What happens if user submits form 100 times in 1 second?" that manual testers might overlook.

Generating Test Data

AI can also generate realistic test data:

class TestDataGenerator:
    """
    Generate realistic test data using AI
    """
    
    def generate_test_data(self, data_type: str, count: int = 10, 
                          constraints: str = "") -> List:
        """
        Generate test data matching specified type and constraints
        
        Args:
            data_type: Type of data (email, phone, address, etc.)
            count: Number of samples to generate
            constraints: Additional requirements
        """
        prompt = f"""Generate {count} realistic test data samples for: {data_type}

CONSTRAINTS:
{constraints if constraints else "None - use realistic, diverse data"}

REQUIREMENTS:
- Data should be realistic and varied
- Include edge cases (min/max lengths, special characters)
- Include both valid and invalid samples if appropriate
- Return as JSON array

Examples:
- For emails: valid formats, invalid formats, edge cases
- For names: different cultures, special characters, very long names
- For addresses: various countries, apartment numbers, PO boxes"""

        try:
            response = openai.ChatCompletion.create(
                model="gpt-3.5-turbo",  # Cheaper for data generation
                messages=[
                    {"role": "user", "content": prompt}
                ],
                temperature=0.9,  # High creativity for varied data
                max_tokens=800
            )
            
            content = response.choices[0].message.content
            
            # Extract JSON
            if "```json" in content:
                content = content.split("```json")[1].split("```")[0].strip()
            elif "```" in content:
                content = content.split("```")[1].split("```")[0].strip()
            
            test_data = json.loads(content)
            return test_data
            
        except Exception as e:
            print(f"❌ Error generating test data: {e}")
            return []

# Usage
data_gen = TestDataGenerator()

# Generate email test data
emails = data_gen.generate_test_data(
    data_type="email addresses",
    count=15,
    constraints="Include valid emails, invalid formats, SQL injection attempts, XSS payloads"
)

print("\n" + "="*60)
print("GENERATED TEST DATA: Emails")
print("="*60)

for i, email in enumerate(emails, 1):
    print(f"{i}. {email}")

# Generate phone numbers
phones = data_gen.generate_test_data(
    data_type="phone numbers",
    count=10,
    constraints="Include US, international, invalid formats, edge cases"
)

print("\n" + "="*60)
print("GENERATED TEST DATA: Phone Numbers")
print("="*60)

for i, phone in enumerate(phones, 1):
    print(f"{i}. {phone}")

Complete Workflow: Requirement to Executable Test

Let's put it all together in an automated pipeline:

class EndToEndTestGenerator:
    """
    Complete pipeline: Requirement β†’ Test Cases β†’ Test Scripts β†’ Test Data
    """
    
    def __init__(self):
        self.test_gen = AITestGenerator(model="gpt-4")
        self.edge_discoverer = EdgeCaseDiscoverer()
        self.data_gen = TestDataGenerator()
    
    def generate_complete_test_suite(self, user_story: str, 
                                     output_dir: str = "generated_tests"):
        """
        Generate entire test suite from user story
        """
        import os
        os.makedirs(output_dir, exist_ok=True)
        
        print("πŸš€ Starting end-to-end test generation pipeline...\n")
        
        # Step 1: Generate test cases
        print("πŸ“ Step 1: Generating test cases...")
        test_cases = self.test_gen.generate_test_cases(
            user_story=user_story,
            num_cases=15,
            include_negative=True
        )
        
        # Save test cases as JSON
        with open(f"{output_dir}/test_cases.json", "w") as f:
            json.dump(test_cases, f, indent=2)
        print(f"βœ… Saved {len(test_cases)} test cases to test_cases.json\n")
        
        # Step 2: Discover edge cases
        print("πŸ” Step 2: Discovering edge cases...")
        edge_cases = self.edge_discoverer.discover_edge_cases(user_story)
        
        with open(f"{output_dir}/edge_cases.json", "w") as f:
            json.dump(edge_cases, f, indent=2)
        print(f"βœ… Discovered {len(edge_cases)} edge cases\n")
        
        # Step 3: Generate test scripts
        print("πŸ’» Step 3: Generating executable test scripts...")
        
        for i, test_case in enumerate(test_cases[:5], 1):  # Generate scripts for first 5
            print(f"   Generating script {i}/5...")
            
            script = self.test_gen.generate_test_script(
                test_case=test_case,
                language="python"
            )
            
            # Save script
            test_id = test_case.get('test_id', f'test_{i}').lower().replace('_', '_')
            filename = f"{output_dir}/{test_id}.py"
            
            with open(filename, "w") as f:
                f.write(script)
        
        print(f"βœ… Generated 5 executable test scripts\n")
        
        # Step 4: Generate test data
        print("πŸ“Š Step 4: Generating test data...")
        
        # Extract data types from test cases
        data_types = self._extract_data_types(test_cases)
        
        all_test_data = {}
        for data_type in data_types:
            data = self.data_gen.generate_test_data(
                data_type=data_type,
                count=20
            )
            all_test_data[data_type] = data
        
        with open(f"{output_dir}/test_data.json", "w") as f:
            json.dump(all_test_data, f, indent=2)
        print(f"βœ… Generated test data for {len(data_types)} data types\n")
        
        # Step 5: Generate summary report
        self._generate_report(output_dir, test_cases, edge_cases, data_types)
        
        print("="*60)
        print("βœ… TEST SUITE GENERATION COMPLETE!")
        print("="*60)
        print(f"Output directory: {output_dir}/")
        print(f"  - test_cases.json ({len(test_cases)} cases)")
        print(f"  - edge_cases.json ({len(edge_cases)} cases)")
        print(f"  - 5 executable Python test scripts")
        print(f"  - test_data.json (test data sets)")
        print(f"  - test_suite_summary.txt (report)")
    
    def _extract_data_types(self, test_cases: List[Dict]) -> List[str]:
        """Extract data types from test cases"""
        data_types = set()
        
        for tc in test_cases:
            test_data = tc.get('test_data', {})
            for key in test_data.keys():
                if 'email' in key.lower():
                    data_types.add('email')
                elif 'password' in key.lower():
                    data_types.add('password')
                elif 'phone' in key.lower():
                    data_types.add('phone')
                elif 'name' in key.lower():
                    data_types.add('name')
        
        return list(data_types) or ['email', 'password']  # Default
    
    def _generate_report(self, output_dir: str, test_cases: List[Dict],
                        edge_cases: List, data_types: List[str]):
        """Generate summary report"""
        report = f"""
TEST SUITE GENERATION SUMMARY
{'='*60}

Generated: {len(test_cases)} test cases
Edge Cases: {len(edge_cases)} scenarios
Test Scripts: 5 executable Python files
Test Data: {len(data_types)} data type sets

TEST CASE BREAKDOWN:
"""
        
        # Count by priority
        priorities = {}
        categories = {}
        
        for tc in test_cases:
            priority = tc.get('priority', 'Unknown')
            category = tc.get('category', 'Unknown')
            
            priorities[priority] = priorities.get(priority, 0) + 1
            categories[category] = categories.get(category, 0) + 1
        
        report += "\nBy Priority:\n"
        for priority, count in sorted(priorities.items()):
            report += f"  {priority}: {count}\n"
        
        report += "\nBy Category:\n"
        for category, count in sorted(categories.items()):
            report += f"  {category}: {count}\n"
        
        report += f"\n{'='*60}\n"
        
        with open(f"{output_dir}/test_suite_summary.txt", "w") as f:
            f.write(report)
        
        print(report)

# Complete workflow example
pipeline = EndToEndTestGenerator()

user_story = """
As a user,
I want to register for an account,
So that I can access premium features.

Acceptance Criteria:
- User provides email, password, and full name
- Email must be unique and valid format
- Password must be 8+ characters with 1 uppercase, 1 number
- User receives confirmation email
- User can log in immediately after registration
- Failed registrations show appropriate error messages
"""

# Generate everything!
pipeline.generate_complete_test_suite(
    user_story=user_story,
    output_dir="registration_tests"
)

βœ… Complete Automation: From a single user story, you now have 15 test cases, edge case scenarios, 5 executable test scripts, and test data setsβ€”all generated in under 60 seconds!

Best Practices for AI Test Generation

  1. Review AI-generated tests: Don't blindly trustβ€”validate logic and coverage
  2. Start with GPT-4: Better quality than GPT-3.5, worth the cost for critical tests
  3. Provide context: Give AI acceptance criteria, constraints, and domain knowledge
  4. Iterate and refine: Regenerate if first output isn't perfect
  5. Combine with human expertise: Use AI for breadth, humans for critical thinking
  6. Version control prompts: Save effective prompts for reuse
  7. Monitor costs: Track API usage, use GPT-3.5 for non-critical generation
  8. Validate generated code: Run and test scripts before committing

⚠️ Limitations: AI can hallucinate element locators or APIs that don't exist. Always validate generated code runs successfully against your actual application.

Integrating into CI/CD

# .github/workflows/ai-test-generation.yml
name: AI Test Generation

on:
  pull_request:
    paths:
      - 'requirements/**'  # Trigger on requirement changes

jobs:
  generate-tests:
    runs-on: ubuntu-latest
    
    steps:
    - uses: actions/checkout@v2
    
    - name: Set up Python
      uses: actions/setup-python@v2
      with:
        python-version: '3.9'
    
    - name: Install dependencies
      run: |
        pip install openai python-dotenv
    
    - name: Generate test cases
      env:
        OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
      run: |
        python generate_tests.py --input requirements/new_feature.txt
    
    - name: Create PR with generated tests
      uses: peter-evans/create-pull-request@v4
      with:
        commit-message: 'chore: AI-generated test cases for new feature'
        title: '[AI] Generated Test Cases'
        body: |
          πŸ€– This PR contains AI-generated test cases.
          
          Please review for accuracy and completeness.
        branch: ai-generated-tests

Practice Exercise

Challenge: Build an AI test generator that:

  1. Reads user stories from a file or Jira API
  2. Generates 20+ comprehensive test cases
  3. Discovers 15+ edge cases and security scenarios
  4. Converts 5 test cases into executable Selenium scripts
  5. Generates realistic test data (100+ samples)
  6. Creates an HTML report with all artifacts

Bonus: Add support for API test generation using requests/pytest!

Key Takeaways

What's Next?

In the next tutorial, AI-Powered Test Data Generation, you'll dive deeper into creating sophisticated test data using GANs, synthetic data generation, and privacy-safe techniques. You'll explore:

βœ… Tutorial Complete! You now have the power to generate comprehensive test suites automatically using AIβ€”say goodbye to tedious manual test case writing!

🎯 Test Your Knowledge: AI Test Generation

Check your understanding of intelligent test case generation

1. What percentage of QA time is typically spent just writing test cases manually?

10-20%
30-40%
50-60%
70-80%

2. Which OpenAI model provides the best quality for test case generation?

GPT-2
GPT-3.5-turbo (cheapest option)
GPT-4 (higher quality, worth the cost)
DALL-E

3. What is a key advantage of AI-generated edge case discovery?

It's completely free
AI can suggest unusual scenarios like Unicode attacks or race conditions that humans might overlook
AI never makes mistakes
It replaces all manual testing

4. What should you always do with AI-generated test code before using it?

Use it immediately without changes
Delete it and start over
Review and validate itβ€”AI can hallucinate locators or APIs that don't exist
Only run it in production