Integrate all AI testing tools into production CI/CD with Docker, monitoring, and complete automation
You've built self-healing tests, bug predictors, visual AI, test generators, and performance analyzers. Now it's time to bring them together. A production AI testing pipeline runs automatically on every commit, catches issues before deployment, and provides actionable insights—without manual intervention.
In this final tutorial, you'll build a complete CI/CD pipeline that orchestrates all your AI testing tools, runs them in Docker containers, monitors results, and generates comprehensive reports. This is the capstone that makes AI testing truly scalable.
┌─────────────────────────────────────────────────────────────┐
│ DEVELOPER WORKFLOW │
└─────────────────────────────────────────────────────────────┘
↓ (git push)
┌─────────────────────────────────────────────────────────────┐
│ CI/CD TRIGGER (GitHub Actions) │
│ • Detect code changes │
│ • Trigger AI testing pipeline │
└─────────────────────────────────────────────────────────────┘
↓
┌─────────────────────────────────────────────────────────────┐
│ STAGE 1: STATIC ANALYSIS │
│ ├─ Bug Prediction (ML) [5 min] │
│ ├─ Code Complexity Analysis [2 min] │
│ └─ Test Generation (AI) [3 min] │
└─────────────────────────────────────────────────────────────┘
↓
┌─────────────────────────────────────────────────────────────┐
│ STAGE 2: FUNCTIONAL TESTING │
│ ├─ Self-Healing Tests [15 min] │
│ ├─ AI-Generated Test Suites [10 min] │
│ └─ Test Data Generation [5 min] │
└─────────────────────────────────────────────────────────────┘
↓
┌─────────────────────────────────────────────────────────────┐
│ STAGE 3: VISUAL & API TESTING │
│ ├─ Visual Regression (AI) [20 min] │
│ ├─ Cross-Browser Testing [25 min] │
│ └─ API Contract Tests [10 min] │
└─────────────────────────────────────────────────────────────┘
↓
┌─────────────────────────────────────────────────────────────┐
│ STAGE 4: PERFORMANCE TESTING │
│ ├─ Load Pattern Generation [5 min] │
│ ├─ AI-Guided Load Testing [30 min] │
│ ├─ Bottleneck Prediction [5 min] │
│ └─ Anomaly Detection [3 min] │
└─────────────────────────────────────────────────────────────┘
↓
┌─────────────────────────────────────────────────────────────┐
│ STAGE 5: REPORTING & ANALYSIS │
│ ├─ Aggregate Results │
│ ├─ Generate Dashboards │
│ ├─ AI Insights & Recommendations │
│ └─ Notify Team (Slack/Email) │
└─────────────────────────────────────────────────────────────┘
↓
✅ DEPLOY or 🚫 BLOCK
💡 Pipeline Philosophy: Fail fast on cheap tests (static analysis), then run expensive tests (performance) only if needed. Total time: ~30 minutes for fast feedback, ~2 hours for comprehensive validation.
First, containerize all AI testing tools for consistent environments:
# Dockerfile.ai-testing
FROM python:3.11-slim
# Install system dependencies
RUN apt-get update && apt-get install -y \
git \
chromium \
chromium-driver \
firefox-esr \
wget \
curl \
&& rm -rf /var/lib/apt/lists/*
# Set working directory
WORKDIR /app
# Copy requirements
COPY requirements.txt .
# Install Python dependencies
RUN pip install --no-cache-dir -r requirements.txt
# Install additional AI/ML libraries
RUN pip install --no-cache-dir \
scikit-learn==1.3.0 \
opencv-python-headless==4.8.0 \
pillow==10.0.0 \
pandas==2.1.0 \
numpy==1.25.0 \
openai==0.28.0 \
locust==2.15.0 \
selenium==4.12.0 \
pytest==7.4.0 \
faker==19.6.0 \
radon==6.0.1
# Copy test framework
COPY ai_testing_framework/ /app/ai_testing_framework/
COPY tests/ /app/tests/
COPY models/ /app/models/
# Set environment variables
ENV PYTHONUNBUFFERED=1
ENV DISPLAY=:99
# Default command
CMD ["pytest", "tests/", "-v", "--html=report.html"]
Create docker-compose for orchestration:
# docker-compose.yml
version: '3.8'
services:
# Bug Prediction Service
bug-predictor:
build:
context: .
dockerfile: Dockerfile.ai-testing
volumes:
- ./reports:/app/reports
- ./models:/app/models
environment:
- SERVICE_NAME=bug-predictor
- MODEL_PATH=/app/models/bug_predictor.pkl
command: python -m ai_testing_framework.bug_prediction
# Self-Healing Test Runner
self-healing-tests:
build:
context: .
dockerfile: Dockerfile.ai-testing
volumes:
- ./reports:/app/reports
- ./screenshots:/app/screenshots
environment:
- SERVICE_NAME=self-healing-tests
- HEADLESS=true
command: pytest tests/self_healing/ -v --html=/app/reports/self_healing.html
# Visual AI Testing
visual-testing:
build:
context: .
dockerfile: Dockerfile.ai-testing
volumes:
- ./reports:/app/reports
- ./screenshots:/app/screenshots
- ./baselines:/app/baselines
environment:
- SERVICE_NAME=visual-testing
- APPLITOOLS_API_KEY=${APPLITOOLS_API_KEY}
command: pytest tests/visual/ -v --html=/app/reports/visual.html
# Performance Testing
performance-testing:
build:
context: .
dockerfile: Dockerfile.ai-testing
volumes:
- ./reports:/app/reports
- ./load_profiles:/app/load_profiles
environment:
- SERVICE_NAME=performance-testing
- TARGET_URL=${TARGET_URL}
command: locust -f tests/performance/ai_load_test.py --headless -u 100 -r 10 -t 10m --html=/app/reports/performance.html
# Test Generation Service
test-generator:
build:
context: .
dockerfile: Dockerfile.ai-testing
volumes:
- ./reports:/app/reports
- ./generated_tests:/app/generated_tests
environment:
- SERVICE_NAME=test-generator
- OPENAI_API_KEY=${OPENAI_API_KEY}
command: python -m ai_testing_framework.test_generation
# Monitoring Dashboard
monitoring:
image: grafana/grafana:latest
ports:
- "3000:3000"
volumes:
- ./grafana/dashboards:/etc/grafana/provisioning/dashboards
- ./grafana/datasources:/etc/grafana/provisioning/datasources
environment:
- GF_SECURITY_ADMIN_PASSWORD=admin
- GF_USERS_ALLOW_SIGN_UP=false
# Prometheus for metrics
prometheus:
image: prom/prometheus:latest
ports:
- "9090:9090"
volumes:
- ./prometheus/prometheus.yml:/etc/prometheus/prometheus.yml
command:
- '--config.file=/etc/prometheus/prometheus.yml'
volumes:
reports:
screenshots:
baselines:
load_profiles:
generated_tests:
models:
Create complete GitHub Actions workflow:
# .github/workflows/ai-testing-pipeline.yml
name: AI Testing Pipeline
on:
push:
branches: [ main, develop ]
pull_request:
branches: [ main ]
schedule:
# Run nightly comprehensive tests
- cron: '0 2 * * *'
env:
PYTHON_VERSION: '3.11'
OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
APPLITOOLS_API_KEY: ${{ secrets.APPLITOOLS_API_KEY }}
jobs:
# Stage 1: Static Analysis & Bug Prediction
static-analysis:
runs-on: ubuntu-latest
timeout-minutes: 15
steps:
- name: Checkout code
uses: actions/checkout@v3
with:
fetch-depth: 0 # Full history for git analysis
- name: Set up Python
uses: actions/setup-python@v4
with:
python-version: ${{ env.PYTHON_VERSION }}
- name: Cache dependencies
uses: actions/cache@v3
with:
path: ~/.cache/pip
key: ${{ runner.os }}-pip-${{ hashFiles('**/requirements.txt') }}
- name: Install dependencies
run: |
pip install -r requirements.txt
pip install scikit-learn pandas radon
- name: Run Bug Prediction
id: bug_prediction
run: |
python ai_testing_framework/bug_prediction.py --mode=ci
echo "high_risk_files=$(cat reports/high_risk_files.txt | wc -l)" >> $GITHUB_OUTPUT
- name: Analyze Code Complexity
run: |
python ai_testing_framework/complexity_analysis.py
- name: Comment PR with Risk Assessment
if: github.event_name == 'pull_request'
uses: actions/github-script@v6
with:
script: |
const fs = require('fs');
const riskReport = fs.readFileSync('reports/risk_assessment.md', 'utf8');
github.rest.issues.createComment({
issue_number: context.issue.number,
owner: context.repo.owner,
repo: context.repo.repo,
body: riskReport
});
- name: Upload Bug Prediction Report
uses: actions/upload-artifact@v3
with:
name: bug-prediction-report
path: reports/bug_prediction/
# Stage 2: AI Test Generation
test-generation:
runs-on: ubuntu-latest
needs: static-analysis
timeout-minutes: 10
steps:
- name: Checkout code
uses: actions/checkout@v3
- name: Set up Python
uses: actions/setup-python@v4
with:
python-version: ${{ env.PYTHON_VERSION }}
- name: Install dependencies
run: pip install openai pytest faker
- name: Generate AI Tests
run: |
python ai_testing_framework/test_generation.py \
--requirements-dir=requirements/ \
--output-dir=generated_tests/
- name: Upload Generated Tests
uses: actions/upload-artifact@v3
with:
name: generated-tests
path: generated_tests/
# Stage 3: Self-Healing Tests
self-healing-tests:
runs-on: ubuntu-latest
needs: test-generation
timeout-minutes: 30
steps:
- name: Checkout code
uses: actions/checkout@v3
- name: Download Generated Tests
uses: actions/download-artifact@v3
with:
name: generated-tests
path: generated_tests/
- name: Set up Python
uses: actions/setup-python@v4
with:
python-version: ${{ env.PYTHON_VERSION }}
- name: Install Chrome
uses: browser-actions/setup-chrome@latest
- name: Install dependencies
run: |
pip install selenium pytest webdriver-manager scikit-learn
- name: Run Self-Healing Tests
run: |
pytest tests/self_healing/ -v \
--html=reports/self_healing.html \
--self-contained-html \
--junit-xml=reports/self_healing_junit.xml
- name: Upload Test Results
if: always()
uses: actions/upload-artifact@v3
with:
name: self-healing-test-results
path: reports/self_healing*
- name: Publish Test Results
if: always()
uses: EnricoMi/publish-unit-test-result-action@v2
with:
files: reports/self_healing_junit.xml
# Stage 4: Visual AI Testing
visual-testing:
runs-on: ubuntu-latest
needs: self-healing-tests
timeout-minutes: 40
steps:
- name: Checkout code
uses: actions/checkout@v3
- name: Set up Python
uses: actions/setup-python@v4
with:
python-version: ${{ env.PYTHON_VERSION }}
- name: Install dependencies
run: |
pip install selenium opencv-python pillow scikit-image pytest
pip install eyes-selenium # Applitools
- name: Run Visual Tests
run: |
pytest tests/visual/ -v \
--html=reports/visual.html \
--self-contained-html
- name: Upload Visual Diff Images
if: failure()
uses: actions/upload-artifact@v3
with:
name: visual-diffs
path: screenshots/diffs/
- name: Upload Visual Test Results
if: always()
uses: actions/upload-artifact@v3
with:
name: visual-test-results
path: reports/visual*
# Stage 5: Performance Testing (Nightly only)
performance-testing:
runs-on: ubuntu-latest
needs: visual-testing
# Only run on schedule or manual trigger
if: github.event_name == 'schedule' || github.event_name == 'workflow_dispatch'
timeout-minutes: 60
steps:
- name: Checkout code
uses: actions/checkout@v3
- name: Set up Python
uses: actions/setup-python@v4
with:
python-version: ${{ env.PYTHON_VERSION }}
- name: Install dependencies
run: |
pip install locust pandas numpy scikit-learn
- name: Generate AI Load Profile
run: |
python ai_testing_framework/load_pattern_generator.py \
--production-logs=logs/production.csv \
--output=load_profiles/ai_generated.json
- name: Run Performance Tests
run: |
locust -f tests/performance/ai_load_test.py \
--headless \
--users 100 \
--spawn-rate 10 \
--run-time 30m \
--html reports/performance.html \
--csv reports/performance
- name: Detect Performance Anomalies
run: |
python ai_testing_framework/anomaly_detection.py \
--results=reports/performance_stats.csv
- name: Upload Performance Results
if: always()
uses: actions/upload-artifact@v3
with:
name: performance-results
path: reports/performance*
# Stage 6: Comprehensive Reporting
reporting:
runs-on: ubuntu-latest
needs: [static-analysis, self-healing-tests, visual-testing]
if: always()
steps:
- name: Checkout code
uses: actions/checkout@v3
- name: Download All Artifacts
uses: actions/download-artifact@v3
- name: Set up Python
uses: actions/setup-python@v4
with:
python-version: ${{ env.PYTHON_VERSION }}
- name: Install dependencies
run: pip install jinja2 pandas matplotlib
- name: Generate Comprehensive Report
run: |
python ai_testing_framework/report_generator.py \
--artifacts-dir=. \
--output=reports/comprehensive_report.html
- name: Upload Comprehensive Report
uses: actions/upload-artifact@v3
with:
name: comprehensive-report
path: reports/comprehensive_report.html
- name: Send Slack Notification
if: always()
uses: 8398a7/action-slack@v3
with:
status: ${{ job.status }}
text: |
AI Testing Pipeline Complete
Results: ${{ job.status }}
Branch: ${{ github.ref }}
Commit: ${{ github.sha }}
Report: ${{ steps.upload.outputs.artifact-url }}
webhook_url: ${{ secrets.SLACK_WEBHOOK }}
- name: Comment on PR with Summary
if: github.event_name == 'pull_request'
uses: actions/github-script@v6
with:
script: |
const fs = require('fs');
const summary = fs.readFileSync('reports/pr_summary.md', 'utf8');
github.rest.issues.createComment({
issue_number: context.issue.number,
owner: context.repo.owner,
repo: context.repo.repo,
body: summary
});
✅ Complete Pipeline: Your CI/CD now runs bug prediction, test generation, self-healing tests, visual AI, and performance testing automatically on every commit!
Create a central orchestrator to manage all AI testing tools:
# ai_testing_framework/orchestrator.py
import asyncio
import json
from datetime import datetime
from pathlib import Path
import subprocess
import sys
class AITestingOrchestrator:
"""
Orchestrate all AI testing tools in the correct sequence
"""
def __init__(self, config_file='pipeline_config.json'):
with open(config_file, 'r') as f:
self.config = json.load(f)
self.results = {}
self.start_time = None
self.reports_dir = Path('reports')
self.reports_dir.mkdir(exist_ok=True)
async def run_stage(self, stage_name, commands):
"""Run a pipeline stage with multiple commands"""
print(f"\n{'='*60}")
print(f"🚀 STAGE: {stage_name}")
print(f"{'='*60}")
stage_start = datetime.now()
stage_results = []
for cmd_config in commands:
cmd_name = cmd_config['name']
cmd = cmd_config['command']
timeout = cmd_config.get('timeout', 300)
print(f"\n▶️ Running: {cmd_name}")
try:
result = await asyncio.wait_for(
self._run_command(cmd),
timeout=timeout
)
stage_results.append({
'name': cmd_name,
'status': 'SUCCESS' if result['returncode'] == 0 else 'FAILED',
'duration': result['duration'],
'output': result['output'][:500] # Truncate
})
if result['returncode'] == 0:
print(f"✅ {cmd_name} completed successfully")
else:
print(f"❌ {cmd_name} failed with code {result['returncode']}")
# Check if this is a blocking failure
if cmd_config.get('blocking', False):
print(f"🚫 Blocking failure detected. Stopping pipeline.")
self.results[stage_name] = stage_results
return False
except asyncio.TimeoutError:
print(f"⏱️ {cmd_name} timed out after {timeout}s")
stage_results.append({
'name': cmd_name,
'status': 'TIMEOUT',
'duration': timeout,
'output': ''
})
if cmd_config.get('blocking', False):
return False
stage_duration = (datetime.now() - stage_start).total_seconds()
print(f"\n✅ Stage '{stage_name}' completed in {stage_duration:.1f}s")
self.results[stage_name] = {
'duration': stage_duration,
'commands': stage_results
}
return True
async def _run_command(self, cmd):
"""Execute a shell command asynchronously"""
start = datetime.now()
process = await asyncio.create_subprocess_shell(
cmd,
stdout=asyncio.subprocess.PIPE,
stderr=asyncio.subprocess.STDOUT
)
stdout, _ = await process.communicate()
duration = (datetime.now() - start).total_seconds()
return {
'returncode': process.returncode,
'output': stdout.decode('utf-8'),
'duration': duration
}
async def run_pipeline(self):
"""Execute the complete AI testing pipeline"""
self.start_time = datetime.now()
print("="*60)
print("🤖 AI TESTING PIPELINE STARTED")
print("="*60)
print(f"Timestamp: {self.start_time}")
print(f"Configuration: {len(self.config['stages'])} stages")
for stage_config in self.config['stages']:
stage_name = stage_config['name']
# Check if stage should run
run_condition = stage_config.get('run_if', 'always')
if run_condition == 'nightly' and not self._is_nightly():
print(f"\n⏭️ Skipping '{stage_name}' (nightly only)")
continue
# Run stage
success = await self.run_stage(stage_name, stage_config['commands'])
if not success:
print(f"\n🚫 Pipeline stopped due to failure in '{stage_name}'")
break
# Generate final report
await self.generate_final_report()
def _is_nightly(self):
"""Check if this is a nightly run"""
return 'NIGHTLY' in os.environ or datetime.now().hour < 6
async def generate_final_report(self):
"""Generate comprehensive pipeline report"""
total_duration = (datetime.now() - self.start_time).total_seconds()
report = {
'pipeline_start': str(self.start_time),
'pipeline_duration': total_duration,
'stages': self.results,
'overall_status': self._get_overall_status()
}
# Save JSON report
report_file = self.reports_dir / 'pipeline_results.json'
with open(report_file, 'w') as f:
json.dump(report, f, indent=2)
# Generate HTML report
html_report = self._generate_html_report(report)
html_file = self.reports_dir / 'pipeline_report.html'
with open(html_file, 'w') as f:
f.write(html_report)
print("\n" + "="*60)
print("📊 PIPELINE COMPLETE")
print("="*60)
print(f"Status: {report['overall_status']}")
print(f"Duration: {total_duration:.1f}s ({total_duration/60:.1f} minutes)")
print(f"Report: {html_file}")
print("="*60)
def _get_overall_status(self):
"""Determine overall pipeline status"""
for stage_name, stage_result in self.results.items():
for cmd in stage_result.get('commands', []):
if cmd['status'] in ['FAILED', 'TIMEOUT']:
return 'FAILED'
return 'SUCCESS'
def _generate_html_report(self, report):
"""Generate HTML report (simplified)"""
html = f"""
<!DOCTYPE html>
<html>
<head>
<title>AI Testing Pipeline Report</title>
<style>
body { font-family: Arial, sans-serif; margin: 20px; }
.header { background: linear-gradient(135deg, #3b82f6, #14b8a6); color: white; padding: 20px; }
.stage { margin: 20px 0; padding: 15px; border: 1px solid #ddd; border-radius: 8px; }
.success { background-color: #d1fae5; }
.failed { background-color: #fee2e2; }
.command { margin: 10px 0; padding: 10px; background: #f8fafc; }
</style>
</head>
<body>
<div class="header">
<h1>AI Testing Pipeline Report</h1>
<p>Status: {report['overall_status']}</p>
<p>Duration: {report['pipeline_duration']:.1f}s</p>
</div>
"""
for stage_name, stage_data in report['stages'].items():
stage_class = 'success' if all(
c['status'] == 'SUCCESS' for c in stage_data.get('commands', [])
) else 'failed'
html += f"""
<div class="stage {stage_class}">
<h2>{stage_name}</h2>
<p>Duration: {stage_data['duration']:.1f}s</p>
"""
for cmd in stage_data.get('commands', []):
html += f"""
<div class="command">
<strong>{cmd['name']}</strong>: {cmd['status']} ({cmd['duration']:.1f}s)
</div>
"""
html += " </div>\n"
html += """
</body>
</html>
"""
return html
# Pipeline configuration
pipeline_config = {
"stages": [
{
"name": "Static Analysis",
"commands": [
{
"name": "Bug Prediction",
"command": "python ai_testing_framework/bug_prediction.py --mode=ci",
"timeout": 300,
"blocking": True
},
{
"name": "Complexity Analysis",
"command": "python ai_testing_framework/complexity_analysis.py",
"timeout": 120,
"blocking": False
}
]
},
{
"name": "Test Generation",
"commands": [
{
"name": "AI Test Generation",
"command": "python ai_testing_framework/test_generation.py",
"timeout": 600,
"blocking": False
}
]
},
{
"name": "Functional Testing",
"commands": [
{
"name": "Self-Healing Tests",
"command": "pytest tests/self_healing/ -v --html=reports/self_healing.html",
"timeout": 1800,
"blocking": True
}
]
},
{
"name": "Visual Testing",
"commands": [
{
"name": "Visual Regression",
"command": "pytest tests/visual/ -v --html=reports/visual.html",
"timeout": 2400,
"blocking": False
}
]
},
{
"name": "Performance Testing",
"run_if": "nightly",
"commands": [
{
"name": "Load Testing",
"command": "locust -f tests/performance/ai_load_test.py --headless -u 100 -r 10 -t 30m",
"timeout": 2400,
"blocking": False
}
]
}
]
}
# Save config
with open('pipeline_config.json', 'w') as f:
json.dump(pipeline_config, f, indent=2)
# Usage
if __name__ == '__main__':
import os
orchestrator = AITestingOrchestrator('pipeline_config.json')
asyncio.run(orchestrator.run_pipeline())
Create Grafana dashboard configuration:
// grafana/dashboards/ai-testing-dashboard.json
{
"dashboard": {
"title": "AI Testing Pipeline Dashboard",
"panels": [
{
"title": "Pipeline Success Rate",
"targets": [
{
"expr": "rate(pipeline_runs_total{status='success'}[1h]) / rate(pipeline_runs_total[1h]) * 100"
}
],
"type": "graph"
},
{
"title": "Test Execution Time",
"targets": [
{
"expr": "histogram_quantile(0.95, pipeline_duration_seconds)"
}
],
"type": "graph"
},
{
"title": "Bug Prediction Accuracy",
"targets": [
{
"expr": "bug_predictor_accuracy"
}
],
"type": "gauge"
},
{
"title": "Self-Healing Events",
"targets": [
{
"expr": "increase(self_healing_events_total[24h])"
}
],
"type": "stat"
},
{
"title": "Visual Regression Failures",
"targets": [
{
"expr": "visual_regression_failures_total"
}
],
"type": "table"
},
{
"title": "Performance Anomalies",
"targets": [
{
"expr": "rate(performance_anomalies_total[1h])"
}
],
"type": "graph"
}
]
}
}
⚠️ Cost Management: AI testing can get expensive! Monitor API usage, set budgets, and use caching. Estimate: $50-200/month for active project with GPT-4 test generation.
Challenge: Build a complete end-to-end AI testing pipeline:
Bonus: Add AI-powered root cause analysis that analyzes failures and suggests fixes!
🎉 Course Complete! You've built a production-grade AI testing system from scratch. You can now detect bugs before they're written, heal tests automatically, validate visual changes with AI, generate realistic load tests, and orchestrate everything in CI/CD. Welcome to the future of quality engineering!
Continue your AI testing journey:
💼 Career Impact: AI testing engineers earn $115k-160k+ and are in high demand. Companies like Google, Meta, Microsoft, Netflix, and Spotify are actively hiring. Your new skills are valuable!
Check your understanding of production AI testing architecture
1. What is the recommended testing strategy for CI/CD pipelines?
2. Why use Docker containers for AI testing tools?
3. What is a "blocking failure" in a testing pipeline?
4. What is the estimated monthly cost for active AI testing with GPT-4?