Project: Object Detection with YOLOv8

🎯 Project Overview

Object detection is the cornerstone of autonomous vehicles, surveillance systems, robotics, and augmented reality. YOLO (You Only Look Once) is the industry standard for real-time detection. In this project, you'll train YOLOv8 to detect objects in images and videos with bounding boxes and confidence scores.

Real-World Applications

Autonomous Vehicles: Detect pedestrians, cars, traffic signs, obstacles
Security & Surveillance: Real-time threat detection, crowd monitoring
Retail Analytics: Customer behavior tracking, inventory monitoring
Manufacturing: Quality control, defect detection on production lines
Healthcare: Medical imaging, tumor detection, surgical assistance
Agriculture: Crop health monitoring, pest detection, yield estimation

What You'll Build

Pre-trained Model Inference: Detect 80 COCO classes (person, car, dog, etc.)
Custom Dataset Training: Train on your own annotated images
Real-Time Detection: Process webcam feed at 30+ FPS
Performance Evaluation: mAP@50, mAP@50-95, precision/recall curves
Model Optimization: Export to ONNX/TensorRT for faster inference
Deployment Pipeline: Production-ready detection API

🚀 Industry-Leading Skill: YOLO powers Tesla's Autopilot, security cameras worldwide, and countless AI products. This project demonstrates cutting-edge computer vision expertise!

📊 Setup & Installation

1 Install Ultralytics YOLOv8

pip install ultralytics opencv-python matplotlib numpy pillow

2 Verify Installation

from ultralytics import YOLO
import cv2
import numpy as np
import matplotlib.pyplot as plt
from PIL import Image

# Check versions
print("Ultralytics YOLO installed successfully!")
print("OpenCV version:", cv2.__version__)

# Download pre-trained model (automatic on first run)
model = YOLO('yolov8n.pt')  # nano model (fastest)
print("\n✅ YOLOv8 model loaded successfully!")
print(f"Model: {model.model_name}")
print(f"Task: {model.task}")

💡 YOLO Model Variants:

YOLOv8n (Nano): Fastest, 3.2M params, 80+ FPS
YOLOv8s (Small): Balanced, 11.2M params, 60+ FPS
YOLOv8m (Medium): Accurate, 25.9M params, 40+ FPS
YOLOv8l (Large): Very accurate, 43.7M params, 30+ FPS
YOLOv8x (Extra Large): Best accuracy, 68.2M params, 20+ FPS

Start with YOLOv8n for faster training, upgrade for accuracy!

🖼️ Part 1: Pre-trained Model Inference

Detect Objects in Images

# Load pre-trained model (COCO dataset: 80 classes)
model = YOLO('yolov8n.pt')

# Download sample image (or use your own)
import urllib.request
urllib.request.urlretrieve(
    'https://ultralytics.com/images/bus.jpg',
    'sample_image.jpg'
)

# Run inference
results = model('sample_image.jpg')

# Display results
result = results[0]
print(f"\n📊 Detection Results:")
print(f"Number of objects detected: {len(result.boxes)}")

# Access detection details
boxes = result.boxes
for box in boxes:
    class_id = int(box.cls[0])
    confidence = float(box.conf[0])
    bbox = box.xyxy[0].cpu().numpy()
    class_name = result.names[class_id]
    
    print(f"  - {class_name}: {confidence:.2f} at [{bbox[0]:.0f}, {bbox[1]:.0f}, {bbox[2]:.0f}, {bbox[3]:.0f}]")

# Visualize with bounding boxes
annotated_image = result.plot()
plt.figure(figsize=(12, 8))
plt.imshow(cv2.cvtColor(annotated_image, cv2.COLOR_BGR2RGB))
plt.title('YOLOv8 Object Detection')
plt.axis('off')
plt.show()

Batch Processing

# Process multiple images
image_paths = ['sample_image.jpg']  # Add more paths

results = model(image_paths, stream=True)

for i, result in enumerate(results):
    print(f"\nImage {i+1}: {len(result.boxes)} objects detected")
    
    # Get class distribution
    class_counts = {}
    for box in result.boxes:
        class_name = result.names[int(box.cls[0])]
        class_counts[class_name] = class_counts.get(class_name, 0) + 1
    
    for class_name, count in class_counts.items():
        print(f"  - {class_name}: {count}")

Video Detection (Real-Time)

# Detect in video
def detect_video(video_path, output_path='output_video.mp4'):
    """
    Run detection on video file or webcam
    
    Parameters:
    -----------
    video_path : str or int
        Path to video file, or 0 for webcam
    output_path : str
        Path to save output video
    """
    model = YOLO('yolov8n.pt')
    
    # Open video
    cap = cv2.VideoCapture(video_path)
    
    # Get video properties
    width = int(cap.get(cv2.CAP_PROP_FRAME_WIDTH))
    height = int(cap.get(cv2.CAP_PROP_FRAME_HEIGHT))
    fps = int(cap.get(cv2.CAP_PROP_FPS))
    
    # Video writer
    fourcc = cv2.VideoWriter_fourcc(*'mp4v')
    out = cv2.VideoWriter(output_path, fourcc, fps, (width, height))
    
    frame_count = 0
    
    while cap.isOpened():
        ret, frame = cap.read()
        if not ret:
            break
        
        # Run detection
        results = model(frame)
        annotated_frame = results[0].plot()
        
        # Write frame
        out.write(annotated_frame)
        
        frame_count += 1
        if frame_count % 30 == 0:
            print(f"Processed {frame_count} frames...")
    
    cap.release()
    out.release()
    print(f"\n✅ Video saved to: {output_path}")

# Example: Detect in webcam (use 0) or video file
# detect_video(0)  # Webcam
# detect_video('your_video.mp4')  # Video file

✅ Checkpoint 1: Pre-trained Inference Working

Successfully running detection:

YOLOv8n loaded and detecting 80 COCO classes
Image detection with bounding boxes
Real-time video processing capability
30+ FPS on modern hardware

🏗️ Part 2: Train on Custom Dataset

Dataset Preparation

# YOLO dataset structure:
# dataset/
#   ├── images/
#   │   ├── train/
#   │   │   ├── img1.jpg
#   │   │   └── img2.jpg
#   │   └── val/
#   │       ├── img3.jpg
#   │       └── img4.jpg
#   └── labels/
#       ├── train/
#       │   ├── img1.txt
#       │   └── img2.txt
#       └── val/
#           ├── img3.txt
#           └── img4.txt

# Label format (one line per object):
# class_id center_x center_y width height
# Example: 0 0.5 0.5 0.3 0.4

# Create data.yaml configuration file
data_yaml = """
# Dataset configuration
path: /path/to/dataset  # dataset root
train: images/train  # train images (relative to 'path')
val: images/val  # val images (relative to 'path')

# Classes
names:
  0: person
  1: car
  2: bicycle
  3: motorcycle
"""

with open('data.yaml', 'w') as f:
    f.write(data_yaml)

print("✅ Dataset configuration created!")
print("\n💡 Use LabelImg or Roboflow to annotate your images:")

Training YOLOv8

# Train on custom dataset
model = YOLO('yolov8n.pt')  # Start from pre-trained weights

# Training parameters
results = model.train(
    data='data.yaml',
    epochs=100,
    imgsz=640,
    batch=16,
    name='custom_yolo_run',
    patience=20,  # Early stopping
    save=True,
    device=0,  # GPU 0 (use 'cpu' for CPU training)
    workers=8,
    lr0=0.01,
    augment=True
)

print("\n✅ Training complete!")
print(f"Best model saved at: runs/detect/custom_yolo_run/weights/best.pt")

⚠️ Training Tips:

Minimum 100 images per class recommended
Balanced dataset crucial (similar samples per class)
Data augmentation helps: flip, rotate, scale, color jitter
Training takes 2-6 hours on GPU (depends on dataset size)
Monitor validation metrics to avoid overfitting

Training on COCO Dataset (Example)

# Train on COCO128 (small COCO subset for quick testing)
model = YOLO('yolov8n.pt')

# COCO128 automatically downloads
results = model.train(
    data='coco128.yaml',
    epochs=30,
    imgsz=640,
    batch=16,
    name='coco128_test'
)

print("✅ COCO128 training complete!")

✅ Checkpoint 2: Custom Training Setup

Training pipeline ready:

Dataset structure configured (images + labels)
data.yaml configuration file created
Training command ready to run
Understand training parameters

📊 Part 3: Evaluation & Metrics

Model Validation

# Load trained model
model = YOLO('runs/detect/custom_yolo_run/weights/best.pt')

# Validate on test set
metrics = model.val()

# Print metrics
print("\n📊 VALIDATION METRICS")
print("="*60)
print(f"mAP@50: {metrics.box.map50:.4f}")        # Mean Average Precision at IoU=0.50
print(f"mAP@50-95: {metrics.box.map:.4f}")      # mAP at IoU=0.50:0.95
print(f"Precision: {metrics.box.mp:.4f}")       # Precision
print(f"Recall: {metrics.box.mr:.4f}")          # Recall
print(f"F1 Score: {2 * (metrics.box.mp * metrics.box.mr) / (metrics.box.mp + metrics.box.mr):.4f}")

# Per-class metrics
print("\n📊 PER-CLASS METRICS")
print("="*60)
for i, class_name in enumerate(model.names.values()):
    if i < len(metrics.box.ap50):
        print(f"{class_name:.<20} AP@50: {metrics.box.ap50[i]:.3f}")

Visualize Training Curves

# Training results are saved as results.csv and plots
import pandas as pd

# Load training results
results_df = pd.read_csv('runs/detect/custom_yolo_run/results.csv')
results_df.columns = results_df.columns.str.strip()  # Remove whitespace

# Plot training curves
fig, axes = plt.subplots(2, 2, figsize=(14, 10))

# Loss curves
axes[0, 0].plot(results_df['epoch'], results_df['train/box_loss'], label='Box Loss')
axes[0, 0].plot(results_df['epoch'], results_df['train/cls_loss'], label='Class Loss')
axes[0, 0].set_xlabel('Epoch')
axes[0, 0].set_ylabel('Loss')
axes[0, 0].set_title('Training Loss')
axes[0, 0].legend()
axes[0, 0].grid(alpha=0.3)

# mAP
axes[0, 1].plot(results_df['epoch'], results_df['metrics/mAP50(B)'], label='mAP@50', linewidth=2)
axes[0, 1].plot(results_df['epoch'], results_df['metrics/mAP50-95(B)'], label='mAP@50-95', linewidth=2)
axes[0, 1].set_xlabel('Epoch')
axes[0, 1].set_ylabel('mAP')
axes[0, 1].set_title('Mean Average Precision')
axes[0, 1].legend()
axes[0, 1].grid(alpha=0.3)

# Precision & Recall
axes[1, 0].plot(results_df['epoch'], results_df['metrics/precision(B)'], label='Precision', linewidth=2)
axes[1, 0].plot(results_df['epoch'], results_df['metrics/recall(B)'], label='Recall', linewidth=2)
axes[1, 0].set_xlabel('Epoch')
axes[1, 0].set_ylabel('Score')
axes[1, 0].set_title('Precision & Recall')
axes[1, 0].legend()
axes[1, 0].grid(alpha=0.3)

# Learning rate
axes[1, 1].plot(results_df['epoch'], results_df['lr/pg0'], linewidth=2)
axes[1, 1].set_xlabel('Epoch')
axes[1, 1].set_ylabel('Learning Rate')
axes[1, 1].set_title('Learning Rate Schedule')
axes[1, 1].grid(alpha=0.3)

plt.tight_layout()
plt.show()

# Best epoch
best_epoch = results_df['metrics/mAP50(B)'].idxmax()
print(f"\n🏆 Best mAP@50 at epoch {best_epoch + 1}: {results_df['metrics/mAP50(B)'][best_epoch]:.4f}")

Confusion Matrix

# Confusion matrix is automatically generated
from PIL import Image

# Load confusion matrix image
conf_matrix = Image.open('runs/detect/custom_yolo_run/confusion_matrix.png')
plt.figure(figsize=(10, 10))
plt.imshow(conf_matrix)
plt.title('Confusion Matrix')
plt.axis('off')
plt.show()

Inference Speed Benchmark

import time

# Benchmark inference speed
model = YOLO('yolov8n.pt')
image = 'sample_image.jpg'

# Warmup
for _ in range(10):
    model(image, verbose=False)

# Benchmark
num_runs = 100
start_time = time.time()

for _ in range(num_runs):
    results = model(image, verbose=False)

end_time = time.time()

avg_time = (end_time - start_time) / num_runs
fps = 1 / avg_time

print(f"\n⚡ INFERENCE SPEED")
print("="*60)
print(f"Average inference time: {avg_time*1000:.2f} ms")
print(f"FPS: {fps:.1f}")
print(f"Model: YOLOv8n")
print(f"Image size: 640x640")

✅ Checkpoint 3: Evaluation Complete

Model performance analyzed:

mAP@50 typically 40-70% (depends on dataset)
Training curves show convergence
Confusion matrix reveals class confusions
Inference speed: 30-80 FPS on modern GPU

🚀 Part 4: Model Optimization & Export

Export to ONNX (Cross-Platform)

# Export to ONNX format
model = YOLO('yolov8n.pt')
model.export(format='onnx', dynamic=True, simplify=True)

print("✅ Model exported to ONNX format!")
print("Use with ONNX Runtime for cross-platform deployment")

Export to TensorRT (NVIDIA GPUs)

# Export to TensorRT for maximum speed on NVIDIA GPUs
model.export(format='engine', half=True)  # FP16 precision

print("✅ Model exported to TensorRT format!")
print("Expect 2-3x speedup on NVIDIA GPUs")

Model Pruning (Reduce Size)

# Use smaller model variant
models = {
    'yolov8n.pt': '6.3 MB - 80 FPS',
    'yolov8s.pt': '22 MB - 60 FPS',
    'yolov8m.pt': '52 MB - 40 FPS'
}

for model_name, specs in models.items():
    print(f"{model_name}: {specs}")

💡 Deployment Options:

PyTorch (.pt): Native format, full features
ONNX (.onnx): Cross-platform, CPU/GPU, 1.2x slower
TensorRT (.engine): NVIDIA GPUs only, 2-3x faster
CoreML (.mlmodel): Apple devices (iOS/macOS)
TFLite (.tflite): Mobile/edge devices, 3-5x smaller

💾 Part 5: Production Deployment

Detection API Function

def detect_objects(image_source, model_path='yolov8n.pt', conf_threshold=0.5):
    """
    Production-ready object detection API
    
    Parameters:
    -----------
    image_source : str, np.array, PIL.Image
        Image file path, numpy array, or PIL image
    model_path : str
        Path to trained YOLO model
    conf_threshold : float
        Confidence threshold (0-1)
    
    Returns:
    --------
    dict with detections, annotated image, and metadata
    """
    # Load model
    model = YOLO(model_path)
    
    # Run detection
    results = model(image_source, conf=conf_threshold)
    result = results[0]
    
    # Extract detections
    detections = []
    for box in result.boxes:
        detection = {
            'class': result.names[int(box.cls[0])],
            'confidence': float(box.conf[0]),
            'bbox': box.xyxy[0].cpu().numpy().tolist(),
            'bbox_normalized': box.xywhn[0].cpu().numpy().tolist()  # [x_center, y_center, width, height]
        }
        detections.append(detection)
    
    # Get annotated image
    annotated_image = result.plot()
    
    return {
        'detections': detections,
        'count': len(detections),
        'annotated_image': annotated_image,
        'image_shape': result.orig_shape,
        'inference_time': result.speed['inference']  # ms
    }

# Example usage
result = detect_objects('sample_image.jpg', conf_threshold=0.5)

print(f"\n📊 DETECTION RESULTS")
print("="*60)
print(f"Objects detected: {result['count']}")
print(f"Inference time: {result['inference_time']:.2f} ms")
print(f"\nDetections:")
for i, det in enumerate(result['detections'], 1):
    print(f"{i}. {det['class']}: {det['confidence']:.2f}")

Flask REST API (Production)

# Save as app.py and run: python app.py

from flask import Flask, request, jsonify, send_file
from ultralytics import YOLO
import cv2
import numpy as np
from PIL import Image
import io
import base64

app = Flask(__name__)
model = YOLO('yolov8n.pt')

@app.route('/detect', methods=['POST'])
def detect():
    """
    POST /detect
    Body: multipart/form-data with 'image' file
    """
    if 'image' not in request.files:
        return jsonify({'error': 'No image provided'}), 400
    
    # Read image
    file = request.files['image']
    image_bytes = file.read()
    image = Image.open(io.BytesIO(image_bytes))
    
    # Convert to numpy
    image_np = np.array(image)
    
    # Detect
    results = model(image_np)
    result = results[0]
    
    # Extract detections
    detections = []
    for box in result.boxes:
        detections.append({
            'class': result.names[int(box.cls[0])],
            'confidence': float(box.conf[0]),
            'bbox': box.xyxy[0].cpu().numpy().tolist()
        })
    
    # Annotate image
    annotated = result.plot()
    
    # Encode image to base64
    _, buffer = cv2.imencode('.jpg', annotated)
    img_base64 = base64.b64encode(buffer).decode('utf-8')
    
    return jsonify({
        'detections': detections,
        'count': len(detections),
        'annotated_image': img_base64,
        'inference_time_ms': result.speed['inference']
    })

@app.route('/health', methods=['GET'])
def health():
    return jsonify({'status': 'healthy', 'model': 'yolov8n'})

if __name__ == '__main__':
    app.run(host='0.0.0.0', port=5000, debug=True)

# Test API:
# curl -X POST -F "image=@sample_image.jpg" http://localhost:5000/detect

Streamlit Web App (Interactive)

# Save as streamlit_app.py and run: streamlit run streamlit_app.py

import streamlit as st
from ultralytics import YOLO
from PIL import Image
import cv2
import numpy as np

st.title("🎯 YOLOv8 Object Detection")
st.write("Upload an image to detect objects")

# Model selection
model_choice = st.selectbox(
    "Select Model",
    ['yolov8n.pt', 'yolov8s.pt', 'yolov8m.pt']
)

# Confidence threshold
conf_threshold = st.slider("Confidence Threshold", 0.0, 1.0, 0.5, 0.05)

# Upload image
uploaded_file = st.file_uploader("Choose an image", type=['jpg', 'jpeg', 'png'])

if uploaded_file:
    # Load image
    image = Image.open(uploaded_file)
    st.image(image, caption='Uploaded Image', use_column_width=True)
    
    # Detect button
    if st.button('Detect Objects'):
        with st.spinner('Detecting...'):
            # Load model
            model = YOLO(model_choice)
            
            # Run detection
            results = model(np.array(image), conf=conf_threshold)
            result = results[0]
            
            # Display results
            annotated = result.plot()
            st.image(cv2.cvtColor(annotated, cv2.COLOR_BGR2RGB),
                    caption='Detection Results',
                    use_column_width=True)
            
            # Show detections
            st.subheader(f"Detected {len(result.boxes)} objects")
            
            for i, box in enumerate(result.boxes, 1):
                class_name = result.names[int(box.cls[0])]
                confidence = float(box.conf[0])
                st.write(f"{i}. **{class_name}**: {confidence:.2%}")

🎯 Project Summary

🎉 Phenomenal Achievement!

You've mastered YOLOv8 object detection - the industry standard for real-time computer vision!

🏆 Key Accomplishments

✅ Deployed pre-trained YOLOv8: Detecting 80 COCO classes in real-time
✅ Custom dataset training: Annotated and trained on your own data
✅ Achieved 30-80 FPS: Real-time detection on modern hardware
✅ Comprehensive evaluation: mAP@50, precision, recall, confusion matrix
✅ Model optimization: Exported to ONNX, TensorRT for deployment
✅ Production API: Flask REST API + Streamlit web app

🚀 Next Level Enhancements

Instance Segmentation: Use YOLOv8-seg for pixel-level masks
Pose Estimation: Detect human keypoints with YOLOv8-pose
Object Tracking: Add ByteTrack or DeepSORT for multi-object tracking
Edge Deployment: Deploy on Raspberry Pi, Jetson Nano, or smartphones
Cloud Deployment: AWS Lambda, Google Cloud Functions, Azure
Active Learning: Continuously improve with user feedback

💼 Interview Talking Points:

"Trained YOLOv8 object detector achieving 60+ mAP@50 on custom dataset"
"Optimized model for real-time inference at 50+ FPS using TensorRT"
"Deployed production REST API handling 100+ requests/sec"
"Built end-to-end pipeline: data annotation → training → deployment"
"System powers real-world applications in security/autonomous vehicles"

📚 Further Learning

Papers: "YOLOv8: An Incremental Improvement" (Ultralytics), original YOLO paper
Resources: Ultralytics documentation, Roboflow tutorials
Datasets: COCO, Open Images, Pascal VOC, custom via Roboflow
Community: Ultralytics GitHub, YOLO Discord, r/computervision