Home → Deep Learning → Object Detection YOLO Project

šŸŽÆ Object Detection with YOLOv8

Build a real-time object detection system using state-of-the-art YOLOv8 architecture

šŸš€ Advanced ā±ļø 8 hours šŸ’» Python + Ultralytics šŸ“· Computer Vision Project

šŸŽÆ Project Overview

Object detection is the cornerstone of autonomous vehicles, surveillance systems, robotics, and augmented reality. YOLO (You Only Look Once) is the industry standard for real-time detection. In this project, you'll train YOLOv8 to detect objects in images and videos with bounding boxes and confidence scores.

Real-World Applications

  • Autonomous Vehicles: Detect pedestrians, cars, traffic signs, obstacles
  • Security & Surveillance: Real-time threat detection, crowd monitoring
  • Retail Analytics: Customer behavior tracking, inventory monitoring
  • Manufacturing: Quality control, defect detection on production lines
  • Healthcare: Medical imaging, tumor detection, surgical assistance
  • Agriculture: Crop health monitoring, pest detection, yield estimation

What You'll Build

  • Pre-trained Model Inference: Detect 80 COCO classes (person, car, dog, etc.)
  • Custom Dataset Training: Train on your own annotated images
  • Real-Time Detection: Process webcam feed at 30+ FPS
  • Performance Evaluation: mAP@50, mAP@50-95, precision/recall curves
  • Model Optimization: Export to ONNX/TensorRT for faster inference
  • Deployment Pipeline: Production-ready detection API

šŸš€ Industry-Leading Skill: YOLO powers Tesla's Autopilot, security cameras worldwide, and countless AI products. This project demonstrates cutting-edge computer vision expertise!

šŸ“Š Setup & Installation

1 Install Ultralytics YOLOv8

pip install ultralytics opencv-python matplotlib numpy pillow

2 Verify Installation

from ultralytics import YOLO
import cv2
import numpy as np
import matplotlib.pyplot as plt
from PIL import Image

# Check versions
print("Ultralytics YOLO installed successfully!")
print("OpenCV version:", cv2.__version__)

# Download pre-trained model (automatic on first run)
model = YOLO('yolov8n.pt')  # nano model (fastest)
print("\nāœ… YOLOv8 model loaded successfully!")
print(f"Model: {model.model_name}")
print(f"Task: {model.task}")

šŸ’” YOLO Model Variants:

  • YOLOv8n (Nano): Fastest, 3.2M params, 80+ FPS
  • YOLOv8s (Small): Balanced, 11.2M params, 60+ FPS
  • YOLOv8m (Medium): Accurate, 25.9M params, 40+ FPS
  • YOLOv8l (Large): Very accurate, 43.7M params, 30+ FPS
  • YOLOv8x (Extra Large): Best accuracy, 68.2M params, 20+ FPS

Start with YOLOv8n for faster training, upgrade for accuracy!

šŸ–¼ļø Part 1: Pre-trained Model Inference

Detect Objects in Images

# Load pre-trained model (COCO dataset: 80 classes)
model = YOLO('yolov8n.pt')

# Download sample image (or use your own)
import urllib.request
urllib.request.urlretrieve(
    'https://ultralytics.com/images/bus.jpg',
    'sample_image.jpg'
)

# Run inference
results = model('sample_image.jpg')

# Display results
result = results[0]
print(f"\nšŸ“Š Detection Results:")
print(f"Number of objects detected: {len(result.boxes)}")

# Access detection details
boxes = result.boxes
for box in boxes:
    class_id = int(box.cls[0])
    confidence = float(box.conf[0])
    bbox = box.xyxy[0].cpu().numpy()
    class_name = result.names[class_id]
    
    print(f"  - {class_name}: {confidence:.2f} at [{bbox[0]:.0f}, {bbox[1]:.0f}, {bbox[2]:.0f}, {bbox[3]:.0f}]")

# Visualize with bounding boxes
annotated_image = result.plot()
plt.figure(figsize=(12, 8))
plt.imshow(cv2.cvtColor(annotated_image, cv2.COLOR_BGR2RGB))
plt.title('YOLOv8 Object Detection')
plt.axis('off')
plt.show()

Batch Processing

# Process multiple images
image_paths = ['sample_image.jpg']  # Add more paths

results = model(image_paths, stream=True)

for i, result in enumerate(results):
    print(f"\nImage {i+1}: {len(result.boxes)} objects detected")
    
    # Get class distribution
    class_counts = {}
    for box in result.boxes:
        class_name = result.names[int(box.cls[0])]
        class_counts[class_name] = class_counts.get(class_name, 0) + 1
    
    for class_name, count in class_counts.items():
        print(f"  - {class_name}: {count}")

Video Detection (Real-Time)

# Detect in video
def detect_video(video_path, output_path='output_video.mp4'):
    """
    Run detection on video file or webcam
    
    Parameters:
    -----------
    video_path : str or int
        Path to video file, or 0 for webcam
    output_path : str
        Path to save output video
    """
    model = YOLO('yolov8n.pt')
    
    # Open video
    cap = cv2.VideoCapture(video_path)
    
    # Get video properties
    width = int(cap.get(cv2.CAP_PROP_FRAME_WIDTH))
    height = int(cap.get(cv2.CAP_PROP_FRAME_HEIGHT))
    fps = int(cap.get(cv2.CAP_PROP_FPS))
    
    # Video writer
    fourcc = cv2.VideoWriter_fourcc(*'mp4v')
    out = cv2.VideoWriter(output_path, fourcc, fps, (width, height))
    
    frame_count = 0
    
    while cap.isOpened():
        ret, frame = cap.read()
        if not ret:
            break
        
        # Run detection
        results = model(frame)
        annotated_frame = results[0].plot()
        
        # Write frame
        out.write(annotated_frame)
        
        frame_count += 1
        if frame_count % 30 == 0:
            print(f"Processed {frame_count} frames...")
    
    cap.release()
    out.release()
    print(f"\nāœ… Video saved to: {output_path}")

# Example: Detect in webcam (use 0) or video file
# detect_video(0)  # Webcam
# detect_video('your_video.mp4')  # Video file

āœ… Checkpoint 1: Pre-trained Inference Working

Successfully running detection:

  • YOLOv8n loaded and detecting 80 COCO classes
  • Image detection with bounding boxes
  • Real-time video processing capability
  • 30+ FPS on modern hardware

šŸ—ļø Part 2: Train on Custom Dataset

Dataset Preparation

# YOLO dataset structure:
# dataset/
#   ā”œā”€ā”€ images/
#   │   ā”œā”€ā”€ train/
#   │   │   ā”œā”€ā”€ img1.jpg
#   │   │   └── img2.jpg
#   │   └── val/
#   │       ā”œā”€ā”€ img3.jpg
#   │       └── img4.jpg
#   └── labels/
#       ā”œā”€ā”€ train/
#       │   ā”œā”€ā”€ img1.txt
#       │   └── img2.txt
#       └── val/
#           ā”œā”€ā”€ img3.txt
#           └── img4.txt

# Label format (one line per object):
# class_id center_x center_y width height
# Example: 0 0.5 0.5 0.3 0.4

# Create data.yaml configuration file
data_yaml = """
# Dataset configuration
path: /path/to/dataset  # dataset root
train: images/train  # train images (relative to 'path')
val: images/val  # val images (relative to 'path')

# Classes
names:
  0: person
  1: car
  2: bicycle
  3: motorcycle
"""

with open('data.yaml', 'w') as f:
    f.write(data_yaml)

print("āœ… Dataset configuration created!")
print("\nšŸ’” Use LabelImg or Roboflow to annotate your images:")

Training YOLOv8

# Train on custom dataset
model = YOLO('yolov8n.pt')  # Start from pre-trained weights

# Training parameters
results = model.train(
    data='data.yaml',
    epochs=100,
    imgsz=640,
    batch=16,
    name='custom_yolo_run',
    patience=20,  # Early stopping
    save=True,
    device=0,  # GPU 0 (use 'cpu' for CPU training)
    workers=8,
    lr0=0.01,
    augment=True
)

print("\nāœ… Training complete!")
print(f"Best model saved at: runs/detect/custom_yolo_run/weights/best.pt")

āš ļø Training Tips:

  • Minimum 100 images per class recommended
  • Balanced dataset crucial (similar samples per class)
  • Data augmentation helps: flip, rotate, scale, color jitter
  • Training takes 2-6 hours on GPU (depends on dataset size)
  • Monitor validation metrics to avoid overfitting

Training on COCO Dataset (Example)

# Train on COCO128 (small COCO subset for quick testing)
model = YOLO('yolov8n.pt')

# COCO128 automatically downloads
results = model.train(
    data='coco128.yaml',
    epochs=30,
    imgsz=640,
    batch=16,
    name='coco128_test'
)

print("āœ… COCO128 training complete!")

āœ… Checkpoint 2: Custom Training Setup

Training pipeline ready:

  • Dataset structure configured (images + labels)
  • data.yaml configuration file created
  • Training command ready to run
  • Understand training parameters

šŸ“Š Part 3: Evaluation & Metrics

Model Validation

# Load trained model
model = YOLO('runs/detect/custom_yolo_run/weights/best.pt')

# Validate on test set
metrics = model.val()

# Print metrics
print("\nšŸ“Š VALIDATION METRICS")
print("="*60)
print(f"mAP@50: {metrics.box.map50:.4f}")        # Mean Average Precision at IoU=0.50
print(f"mAP@50-95: {metrics.box.map:.4f}")      # mAP at IoU=0.50:0.95
print(f"Precision: {metrics.box.mp:.4f}")       # Precision
print(f"Recall: {metrics.box.mr:.4f}")          # Recall
print(f"F1 Score: {2 * (metrics.box.mp * metrics.box.mr) / (metrics.box.mp + metrics.box.mr):.4f}")

# Per-class metrics
print("\nšŸ“Š PER-CLASS METRICS")
print("="*60)
for i, class_name in enumerate(model.names.values()):
    if i < len(metrics.box.ap50):
        print(f"{class_name:.<20} AP@50: {metrics.box.ap50[i]:.3f}")

Visualize Training Curves

# Training results are saved as results.csv and plots
import pandas as pd

# Load training results
results_df = pd.read_csv('runs/detect/custom_yolo_run/results.csv')
results_df.columns = results_df.columns.str.strip()  # Remove whitespace

# Plot training curves
fig, axes = plt.subplots(2, 2, figsize=(14, 10))

# Loss curves
axes[0, 0].plot(results_df['epoch'], results_df['train/box_loss'], label='Box Loss')
axes[0, 0].plot(results_df['epoch'], results_df['train/cls_loss'], label='Class Loss')
axes[0, 0].set_xlabel('Epoch')
axes[0, 0].set_ylabel('Loss')
axes[0, 0].set_title('Training Loss')
axes[0, 0].legend()
axes[0, 0].grid(alpha=0.3)

# mAP
axes[0, 1].plot(results_df['epoch'], results_df['metrics/mAP50(B)'], label='mAP@50', linewidth=2)
axes[0, 1].plot(results_df['epoch'], results_df['metrics/mAP50-95(B)'], label='mAP@50-95', linewidth=2)
axes[0, 1].set_xlabel('Epoch')
axes[0, 1].set_ylabel('mAP')
axes[0, 1].set_title('Mean Average Precision')
axes[0, 1].legend()
axes[0, 1].grid(alpha=0.3)

# Precision & Recall
axes[1, 0].plot(results_df['epoch'], results_df['metrics/precision(B)'], label='Precision', linewidth=2)
axes[1, 0].plot(results_df['epoch'], results_df['metrics/recall(B)'], label='Recall', linewidth=2)
axes[1, 0].set_xlabel('Epoch')
axes[1, 0].set_ylabel('Score')
axes[1, 0].set_title('Precision & Recall')
axes[1, 0].legend()
axes[1, 0].grid(alpha=0.3)

# Learning rate
axes[1, 1].plot(results_df['epoch'], results_df['lr/pg0'], linewidth=2)
axes[1, 1].set_xlabel('Epoch')
axes[1, 1].set_ylabel('Learning Rate')
axes[1, 1].set_title('Learning Rate Schedule')
axes[1, 1].grid(alpha=0.3)

plt.tight_layout()
plt.show()

# Best epoch
best_epoch = results_df['metrics/mAP50(B)'].idxmax()
print(f"\nšŸ† Best mAP@50 at epoch {best_epoch + 1}: {results_df['metrics/mAP50(B)'][best_epoch]:.4f}")

Confusion Matrix

# Confusion matrix is automatically generated
from PIL import Image

# Load confusion matrix image
conf_matrix = Image.open('runs/detect/custom_yolo_run/confusion_matrix.png')
plt.figure(figsize=(10, 10))
plt.imshow(conf_matrix)
plt.title('Confusion Matrix')
plt.axis('off')
plt.show()

Inference Speed Benchmark

import time

# Benchmark inference speed
model = YOLO('yolov8n.pt')
image = 'sample_image.jpg'

# Warmup
for _ in range(10):
    model(image, verbose=False)

# Benchmark
num_runs = 100
start_time = time.time()

for _ in range(num_runs):
    results = model(image, verbose=False)

end_time = time.time()

avg_time = (end_time - start_time) / num_runs
fps = 1 / avg_time

print(f"\n⚔ INFERENCE SPEED")
print("="*60)
print(f"Average inference time: {avg_time*1000:.2f} ms")
print(f"FPS: {fps:.1f}")
print(f"Model: YOLOv8n")
print(f"Image size: 640x640")

āœ… Checkpoint 3: Evaluation Complete

Model performance analyzed:

  • mAP@50 typically 40-70% (depends on dataset)
  • Training curves show convergence
  • Confusion matrix reveals class confusions
  • Inference speed: 30-80 FPS on modern GPU

šŸš€ Part 4: Model Optimization & Export

Export to ONNX (Cross-Platform)

# Export to ONNX format
model = YOLO('yolov8n.pt')
model.export(format='onnx', dynamic=True, simplify=True)

print("āœ… Model exported to ONNX format!")
print("Use with ONNX Runtime for cross-platform deployment")

Export to TensorRT (NVIDIA GPUs)

# Export to TensorRT for maximum speed on NVIDIA GPUs
model.export(format='engine', half=True)  # FP16 precision

print("āœ… Model exported to TensorRT format!")
print("Expect 2-3x speedup on NVIDIA GPUs")

Model Pruning (Reduce Size)

# Use smaller model variant
models = {
    'yolov8n.pt': '6.3 MB - 80 FPS',
    'yolov8s.pt': '22 MB - 60 FPS',
    'yolov8m.pt': '52 MB - 40 FPS'
}

for model_name, specs in models.items():
    print(f"{model_name}: {specs}")

šŸ’” Deployment Options:

  • PyTorch (.pt): Native format, full features
  • ONNX (.onnx): Cross-platform, CPU/GPU, 1.2x slower
  • TensorRT (.engine): NVIDIA GPUs only, 2-3x faster
  • CoreML (.mlmodel): Apple devices (iOS/macOS)
  • TFLite (.tflite): Mobile/edge devices, 3-5x smaller

šŸ’¾ Part 5: Production Deployment

Detection API Function

def detect_objects(image_source, model_path='yolov8n.pt', conf_threshold=0.5):
    """
    Production-ready object detection API
    
    Parameters:
    -----------
    image_source : str, np.array, PIL.Image
        Image file path, numpy array, or PIL image
    model_path : str
        Path to trained YOLO model
    conf_threshold : float
        Confidence threshold (0-1)
    
    Returns:
    --------
    dict with detections, annotated image, and metadata
    """
    # Load model
    model = YOLO(model_path)
    
    # Run detection
    results = model(image_source, conf=conf_threshold)
    result = results[0]
    
    # Extract detections
    detections = []
    for box in result.boxes:
        detection = {
            'class': result.names[int(box.cls[0])],
            'confidence': float(box.conf[0]),
            'bbox': box.xyxy[0].cpu().numpy().tolist(),
            'bbox_normalized': box.xywhn[0].cpu().numpy().tolist()  # [x_center, y_center, width, height]
        }
        detections.append(detection)
    
    # Get annotated image
    annotated_image = result.plot()
    
    return {
        'detections': detections,
        'count': len(detections),
        'annotated_image': annotated_image,
        'image_shape': result.orig_shape,
        'inference_time': result.speed['inference']  # ms
    }

# Example usage
result = detect_objects('sample_image.jpg', conf_threshold=0.5)

print(f"\nšŸ“Š DETECTION RESULTS")
print("="*60)
print(f"Objects detected: {result['count']}")
print(f"Inference time: {result['inference_time']:.2f} ms")
print(f"\nDetections:")
for i, det in enumerate(result['detections'], 1):
    print(f"{i}. {det['class']}: {det['confidence']:.2f}")

Flask REST API (Production)

# Save as app.py and run: python app.py

from flask import Flask, request, jsonify, send_file
from ultralytics import YOLO
import cv2
import numpy as np
from PIL import Image
import io
import base64

app = Flask(__name__)
model = YOLO('yolov8n.pt')

@app.route('/detect', methods=['POST'])
def detect():
    """
    POST /detect
    Body: multipart/form-data with 'image' file
    """
    if 'image' not in request.files:
        return jsonify({'error': 'No image provided'}), 400
    
    # Read image
    file = request.files['image']
    image_bytes = file.read()
    image = Image.open(io.BytesIO(image_bytes))
    
    # Convert to numpy
    image_np = np.array(image)
    
    # Detect
    results = model(image_np)
    result = results[0]
    
    # Extract detections
    detections = []
    for box in result.boxes:
        detections.append({
            'class': result.names[int(box.cls[0])],
            'confidence': float(box.conf[0]),
            'bbox': box.xyxy[0].cpu().numpy().tolist()
        })
    
    # Annotate image
    annotated = result.plot()
    
    # Encode image to base64
    _, buffer = cv2.imencode('.jpg', annotated)
    img_base64 = base64.b64encode(buffer).decode('utf-8')
    
    return jsonify({
        'detections': detections,
        'count': len(detections),
        'annotated_image': img_base64,
        'inference_time_ms': result.speed['inference']
    })

@app.route('/health', methods=['GET'])
def health():
    return jsonify({'status': 'healthy', 'model': 'yolov8n'})

if __name__ == '__main__':
    app.run(host='0.0.0.0', port=5000, debug=True)

# Test API:
# curl -X POST -F "image=@sample_image.jpg" http://localhost:5000/detect

Streamlit Web App (Interactive)

# Save as streamlit_app.py and run: streamlit run streamlit_app.py

import streamlit as st
from ultralytics import YOLO
from PIL import Image
import cv2
import numpy as np

st.title("šŸŽÆ YOLOv8 Object Detection")
st.write("Upload an image to detect objects")

# Model selection
model_choice = st.selectbox(
    "Select Model",
    ['yolov8n.pt', 'yolov8s.pt', 'yolov8m.pt']
)

# Confidence threshold
conf_threshold = st.slider("Confidence Threshold", 0.0, 1.0, 0.5, 0.05)

# Upload image
uploaded_file = st.file_uploader("Choose an image", type=['jpg', 'jpeg', 'png'])

if uploaded_file:
    # Load image
    image = Image.open(uploaded_file)
    st.image(image, caption='Uploaded Image', use_column_width=True)
    
    # Detect button
    if st.button('Detect Objects'):
        with st.spinner('Detecting...'):
            # Load model
            model = YOLO(model_choice)
            
            # Run detection
            results = model(np.array(image), conf=conf_threshold)
            result = results[0]
            
            # Display results
            annotated = result.plot()
            st.image(cv2.cvtColor(annotated, cv2.COLOR_BGR2RGB),
                    caption='Detection Results',
                    use_column_width=True)
            
            # Show detections
            st.subheader(f"Detected {len(result.boxes)} objects")
            
            for i, box in enumerate(result.boxes, 1):
                class_name = result.names[int(box.cls[0])]
                confidence = float(box.conf[0])
                st.write(f"{i}. **{class_name}**: {confidence:.2%}")

šŸŽÆ Project Summary

šŸŽ‰ Phenomenal Achievement!

You've mastered YOLOv8 object detection - the industry standard for real-time computer vision!

šŸ† Key Accomplishments

  • āœ… Deployed pre-trained YOLOv8: Detecting 80 COCO classes in real-time
  • āœ… Custom dataset training: Annotated and trained on your own data
  • āœ… Achieved 30-80 FPS: Real-time detection on modern hardware
  • āœ… Comprehensive evaluation: mAP@50, precision, recall, confusion matrix
  • āœ… Model optimization: Exported to ONNX, TensorRT for deployment
  • āœ… Production API: Flask REST API + Streamlit web app

šŸš€ Next Level Enhancements

  • Instance Segmentation: Use YOLOv8-seg for pixel-level masks
  • Pose Estimation: Detect human keypoints with YOLOv8-pose
  • Object Tracking: Add ByteTrack or DeepSORT for multi-object tracking
  • Edge Deployment: Deploy on Raspberry Pi, Jetson Nano, or smartphones
  • Cloud Deployment: AWS Lambda, Google Cloud Functions, Azure
  • Active Learning: Continuously improve with user feedback

šŸ’¼ Interview Talking Points:

  • "Trained YOLOv8 object detector achieving 60+ mAP@50 on custom dataset"
  • "Optimized model for real-time inference at 50+ FPS using TensorRT"
  • "Deployed production REST API handling 100+ requests/sec"
  • "Built end-to-end pipeline: data annotation → training → deployment"
  • "System powers real-world applications in security/autonomous vehicles"

šŸ“š Further Learning

  • Papers: "YOLOv8: An Incremental Improvement" (Ultralytics), original YOLO paper
  • Resources: Ultralytics documentation, Roboflow tutorials
  • Datasets: COCO, Open Images, Pascal VOC, custom via Roboflow
  • Community: Ultralytics GitHub, YOLO Discord, r/computervision