šÆ Project Overview
Object detection is the cornerstone of autonomous vehicles, surveillance systems, robotics, and augmented reality. YOLO (You Only Look Once) is the industry standard for real-time detection. In this project, you'll train YOLOv8 to detect objects in images and videos with bounding boxes and confidence scores.
Real-World Applications
- Autonomous Vehicles: Detect pedestrians, cars, traffic signs, obstacles
- Security & Surveillance: Real-time threat detection, crowd monitoring
- Retail Analytics: Customer behavior tracking, inventory monitoring
- Manufacturing: Quality control, defect detection on production lines
- Healthcare: Medical imaging, tumor detection, surgical assistance
- Agriculture: Crop health monitoring, pest detection, yield estimation
What You'll Build
- Pre-trained Model Inference: Detect 80 COCO classes (person, car, dog, etc.)
- Custom Dataset Training: Train on your own annotated images
- Real-Time Detection: Process webcam feed at 30+ FPS
- Performance Evaluation: mAP@50, mAP@50-95, precision/recall curves
- Model Optimization: Export to ONNX/TensorRT for faster inference
- Deployment Pipeline: Production-ready detection API
š Industry-Leading Skill: YOLO powers Tesla's Autopilot, security cameras worldwide, and countless AI products. This project demonstrates cutting-edge computer vision expertise!
š Setup & Installation
1 Install Ultralytics YOLOv8
pip install ultralytics opencv-python matplotlib numpy pillow
2 Verify Installation
from ultralytics import YOLO
import cv2
import numpy as np
import matplotlib.pyplot as plt
from PIL import Image
# Check versions
print("Ultralytics YOLO installed successfully!")
print("OpenCV version:", cv2.__version__)
# Download pre-trained model (automatic on first run)
model = YOLO('yolov8n.pt') # nano model (fastest)
print("\nā
YOLOv8 model loaded successfully!")
print(f"Model: {model.model_name}")
print(f"Task: {model.task}")
š” YOLO Model Variants:
- YOLOv8n (Nano): Fastest, 3.2M params, 80+ FPS
- YOLOv8s (Small): Balanced, 11.2M params, 60+ FPS
- YOLOv8m (Medium): Accurate, 25.9M params, 40+ FPS
- YOLOv8l (Large): Very accurate, 43.7M params, 30+ FPS
- YOLOv8x (Extra Large): Best accuracy, 68.2M params, 20+ FPS
Start with YOLOv8n for faster training, upgrade for accuracy!
š¼ļø Part 1: Pre-trained Model Inference
Detect Objects in Images
# Load pre-trained model (COCO dataset: 80 classes)
model = YOLO('yolov8n.pt')
# Download sample image (or use your own)
import urllib.request
urllib.request.urlretrieve(
'https://ultralytics.com/images/bus.jpg',
'sample_image.jpg'
)
# Run inference
results = model('sample_image.jpg')
# Display results
result = results[0]
print(f"\nš Detection Results:")
print(f"Number of objects detected: {len(result.boxes)}")
# Access detection details
boxes = result.boxes
for box in boxes:
class_id = int(box.cls[0])
confidence = float(box.conf[0])
bbox = box.xyxy[0].cpu().numpy()
class_name = result.names[class_id]
print(f" - {class_name}: {confidence:.2f} at [{bbox[0]:.0f}, {bbox[1]:.0f}, {bbox[2]:.0f}, {bbox[3]:.0f}]")
# Visualize with bounding boxes
annotated_image = result.plot()
plt.figure(figsize=(12, 8))
plt.imshow(cv2.cvtColor(annotated_image, cv2.COLOR_BGR2RGB))
plt.title('YOLOv8 Object Detection')
plt.axis('off')
plt.show()
Batch Processing
# Process multiple images
image_paths = ['sample_image.jpg'] # Add more paths
results = model(image_paths, stream=True)
for i, result in enumerate(results):
print(f"\nImage {i+1}: {len(result.boxes)} objects detected")
# Get class distribution
class_counts = {}
for box in result.boxes:
class_name = result.names[int(box.cls[0])]
class_counts[class_name] = class_counts.get(class_name, 0) + 1
for class_name, count in class_counts.items():
print(f" - {class_name}: {count}")
Video Detection (Real-Time)
# Detect in video
def detect_video(video_path, output_path='output_video.mp4'):
"""
Run detection on video file or webcam
Parameters:
-----------
video_path : str or int
Path to video file, or 0 for webcam
output_path : str
Path to save output video
"""
model = YOLO('yolov8n.pt')
# Open video
cap = cv2.VideoCapture(video_path)
# Get video properties
width = int(cap.get(cv2.CAP_PROP_FRAME_WIDTH))
height = int(cap.get(cv2.CAP_PROP_FRAME_HEIGHT))
fps = int(cap.get(cv2.CAP_PROP_FPS))
# Video writer
fourcc = cv2.VideoWriter_fourcc(*'mp4v')
out = cv2.VideoWriter(output_path, fourcc, fps, (width, height))
frame_count = 0
while cap.isOpened():
ret, frame = cap.read()
if not ret:
break
# Run detection
results = model(frame)
annotated_frame = results[0].plot()
# Write frame
out.write(annotated_frame)
frame_count += 1
if frame_count % 30 == 0:
print(f"Processed {frame_count} frames...")
cap.release()
out.release()
print(f"\nā
Video saved to: {output_path}")
# Example: Detect in webcam (use 0) or video file
# detect_video(0) # Webcam
# detect_video('your_video.mp4') # Video file
ā Checkpoint 1: Pre-trained Inference Working
Successfully running detection:
- YOLOv8n loaded and detecting 80 COCO classes
- Image detection with bounding boxes
- Real-time video processing capability
- 30+ FPS on modern hardware
šļø Part 2: Train on Custom Dataset
Dataset Preparation
# YOLO dataset structure:
# dataset/
# āāā images/
# ā āāā train/
# ā ā āāā img1.jpg
# ā ā āāā img2.jpg
# ā āāā val/
# ā āāā img3.jpg
# ā āāā img4.jpg
# āāā labels/
# āāā train/
# ā āāā img1.txt
# ā āāā img2.txt
# āāā val/
# āāā img3.txt
# āāā img4.txt
# Label format (one line per object):
# class_id center_x center_y width height
# Example: 0 0.5 0.5 0.3 0.4
# Create data.yaml configuration file
data_yaml = """
# Dataset configuration
path: /path/to/dataset # dataset root
train: images/train # train images (relative to 'path')
val: images/val # val images (relative to 'path')
# Classes
names:
0: person
1: car
2: bicycle
3: motorcycle
"""
with open('data.yaml', 'w') as f:
f.write(data_yaml)
print("ā
Dataset configuration created!")
print("\nš” Use LabelImg or Roboflow to annotate your images:")
Training YOLOv8
# Train on custom dataset
model = YOLO('yolov8n.pt') # Start from pre-trained weights
# Training parameters
results = model.train(
data='data.yaml',
epochs=100,
imgsz=640,
batch=16,
name='custom_yolo_run',
patience=20, # Early stopping
save=True,
device=0, # GPU 0 (use 'cpu' for CPU training)
workers=8,
lr0=0.01,
augment=True
)
print("\nā
Training complete!")
print(f"Best model saved at: runs/detect/custom_yolo_run/weights/best.pt")
ā ļø Training Tips:
- Minimum 100 images per class recommended
- Balanced dataset crucial (similar samples per class)
- Data augmentation helps: flip, rotate, scale, color jitter
- Training takes 2-6 hours on GPU (depends on dataset size)
- Monitor validation metrics to avoid overfitting
Training on COCO Dataset (Example)
# Train on COCO128 (small COCO subset for quick testing)
model = YOLO('yolov8n.pt')
# COCO128 automatically downloads
results = model.train(
data='coco128.yaml',
epochs=30,
imgsz=640,
batch=16,
name='coco128_test'
)
print("ā
COCO128 training complete!")
ā Checkpoint 2: Custom Training Setup
Training pipeline ready:
- Dataset structure configured (images + labels)
- data.yaml configuration file created
- Training command ready to run
- Understand training parameters
š Part 3: Evaluation & Metrics
Model Validation
# Load trained model
model = YOLO('runs/detect/custom_yolo_run/weights/best.pt')
# Validate on test set
metrics = model.val()
# Print metrics
print("\nš VALIDATION METRICS")
print("="*60)
print(f"mAP@50: {metrics.box.map50:.4f}") # Mean Average Precision at IoU=0.50
print(f"mAP@50-95: {metrics.box.map:.4f}") # mAP at IoU=0.50:0.95
print(f"Precision: {metrics.box.mp:.4f}") # Precision
print(f"Recall: {metrics.box.mr:.4f}") # Recall
print(f"F1 Score: {2 * (metrics.box.mp * metrics.box.mr) / (metrics.box.mp + metrics.box.mr):.4f}")
# Per-class metrics
print("\nš PER-CLASS METRICS")
print("="*60)
for i, class_name in enumerate(model.names.values()):
if i < len(metrics.box.ap50):
print(f"{class_name:.<20} AP@50: {metrics.box.ap50[i]:.3f}")
Visualize Training Curves
# Training results are saved as results.csv and plots
import pandas as pd
# Load training results
results_df = pd.read_csv('runs/detect/custom_yolo_run/results.csv')
results_df.columns = results_df.columns.str.strip() # Remove whitespace
# Plot training curves
fig, axes = plt.subplots(2, 2, figsize=(14, 10))
# Loss curves
axes[0, 0].plot(results_df['epoch'], results_df['train/box_loss'], label='Box Loss')
axes[0, 0].plot(results_df['epoch'], results_df['train/cls_loss'], label='Class Loss')
axes[0, 0].set_xlabel('Epoch')
axes[0, 0].set_ylabel('Loss')
axes[0, 0].set_title('Training Loss')
axes[0, 0].legend()
axes[0, 0].grid(alpha=0.3)
# mAP
axes[0, 1].plot(results_df['epoch'], results_df['metrics/mAP50(B)'], label='mAP@50', linewidth=2)
axes[0, 1].plot(results_df['epoch'], results_df['metrics/mAP50-95(B)'], label='mAP@50-95', linewidth=2)
axes[0, 1].set_xlabel('Epoch')
axes[0, 1].set_ylabel('mAP')
axes[0, 1].set_title('Mean Average Precision')
axes[0, 1].legend()
axes[0, 1].grid(alpha=0.3)
# Precision & Recall
axes[1, 0].plot(results_df['epoch'], results_df['metrics/precision(B)'], label='Precision', linewidth=2)
axes[1, 0].plot(results_df['epoch'], results_df['metrics/recall(B)'], label='Recall', linewidth=2)
axes[1, 0].set_xlabel('Epoch')
axes[1, 0].set_ylabel('Score')
axes[1, 0].set_title('Precision & Recall')
axes[1, 0].legend()
axes[1, 0].grid(alpha=0.3)
# Learning rate
axes[1, 1].plot(results_df['epoch'], results_df['lr/pg0'], linewidth=2)
axes[1, 1].set_xlabel('Epoch')
axes[1, 1].set_ylabel('Learning Rate')
axes[1, 1].set_title('Learning Rate Schedule')
axes[1, 1].grid(alpha=0.3)
plt.tight_layout()
plt.show()
# Best epoch
best_epoch = results_df['metrics/mAP50(B)'].idxmax()
print(f"\nš Best mAP@50 at epoch {best_epoch + 1}: {results_df['metrics/mAP50(B)'][best_epoch]:.4f}")
Confusion Matrix
# Confusion matrix is automatically generated
from PIL import Image
# Load confusion matrix image
conf_matrix = Image.open('runs/detect/custom_yolo_run/confusion_matrix.png')
plt.figure(figsize=(10, 10))
plt.imshow(conf_matrix)
plt.title('Confusion Matrix')
plt.axis('off')
plt.show()
Inference Speed Benchmark
import time
# Benchmark inference speed
model = YOLO('yolov8n.pt')
image = 'sample_image.jpg'
# Warmup
for _ in range(10):
model(image, verbose=False)
# Benchmark
num_runs = 100
start_time = time.time()
for _ in range(num_runs):
results = model(image, verbose=False)
end_time = time.time()
avg_time = (end_time - start_time) / num_runs
fps = 1 / avg_time
print(f"\nā” INFERENCE SPEED")
print("="*60)
print(f"Average inference time: {avg_time*1000:.2f} ms")
print(f"FPS: {fps:.1f}")
print(f"Model: YOLOv8n")
print(f"Image size: 640x640")
ā Checkpoint 3: Evaluation Complete
Model performance analyzed:
- mAP@50 typically 40-70% (depends on dataset)
- Training curves show convergence
- Confusion matrix reveals class confusions
- Inference speed: 30-80 FPS on modern GPU
š Part 4: Model Optimization & Export
Export to ONNX (Cross-Platform)
# Export to ONNX format
model = YOLO('yolov8n.pt')
model.export(format='onnx', dynamic=True, simplify=True)
print("ā
Model exported to ONNX format!")
print("Use with ONNX Runtime for cross-platform deployment")
Export to TensorRT (NVIDIA GPUs)
# Export to TensorRT for maximum speed on NVIDIA GPUs
model.export(format='engine', half=True) # FP16 precision
print("ā
Model exported to TensorRT format!")
print("Expect 2-3x speedup on NVIDIA GPUs")
Model Pruning (Reduce Size)
# Use smaller model variant
models = {
'yolov8n.pt': '6.3 MB - 80 FPS',
'yolov8s.pt': '22 MB - 60 FPS',
'yolov8m.pt': '52 MB - 40 FPS'
}
for model_name, specs in models.items():
print(f"{model_name}: {specs}")
š” Deployment Options:
- PyTorch (.pt): Native format, full features
- ONNX (.onnx): Cross-platform, CPU/GPU, 1.2x slower
- TensorRT (.engine): NVIDIA GPUs only, 2-3x faster
- CoreML (.mlmodel): Apple devices (iOS/macOS)
- TFLite (.tflite): Mobile/edge devices, 3-5x smaller
š¾ Part 5: Production Deployment
Detection API Function
def detect_objects(image_source, model_path='yolov8n.pt', conf_threshold=0.5):
"""
Production-ready object detection API
Parameters:
-----------
image_source : str, np.array, PIL.Image
Image file path, numpy array, or PIL image
model_path : str
Path to trained YOLO model
conf_threshold : float
Confidence threshold (0-1)
Returns:
--------
dict with detections, annotated image, and metadata
"""
# Load model
model = YOLO(model_path)
# Run detection
results = model(image_source, conf=conf_threshold)
result = results[0]
# Extract detections
detections = []
for box in result.boxes:
detection = {
'class': result.names[int(box.cls[0])],
'confidence': float(box.conf[0]),
'bbox': box.xyxy[0].cpu().numpy().tolist(),
'bbox_normalized': box.xywhn[0].cpu().numpy().tolist() # [x_center, y_center, width, height]
}
detections.append(detection)
# Get annotated image
annotated_image = result.plot()
return {
'detections': detections,
'count': len(detections),
'annotated_image': annotated_image,
'image_shape': result.orig_shape,
'inference_time': result.speed['inference'] # ms
}
# Example usage
result = detect_objects('sample_image.jpg', conf_threshold=0.5)
print(f"\nš DETECTION RESULTS")
print("="*60)
print(f"Objects detected: {result['count']}")
print(f"Inference time: {result['inference_time']:.2f} ms")
print(f"\nDetections:")
for i, det in enumerate(result['detections'], 1):
print(f"{i}. {det['class']}: {det['confidence']:.2f}")
Flask REST API (Production)
# Save as app.py and run: python app.py
from flask import Flask, request, jsonify, send_file
from ultralytics import YOLO
import cv2
import numpy as np
from PIL import Image
import io
import base64
app = Flask(__name__)
model = YOLO('yolov8n.pt')
@app.route('/detect', methods=['POST'])
def detect():
"""
POST /detect
Body: multipart/form-data with 'image' file
"""
if 'image' not in request.files:
return jsonify({'error': 'No image provided'}), 400
# Read image
file = request.files['image']
image_bytes = file.read()
image = Image.open(io.BytesIO(image_bytes))
# Convert to numpy
image_np = np.array(image)
# Detect
results = model(image_np)
result = results[0]
# Extract detections
detections = []
for box in result.boxes:
detections.append({
'class': result.names[int(box.cls[0])],
'confidence': float(box.conf[0]),
'bbox': box.xyxy[0].cpu().numpy().tolist()
})
# Annotate image
annotated = result.plot()
# Encode image to base64
_, buffer = cv2.imencode('.jpg', annotated)
img_base64 = base64.b64encode(buffer).decode('utf-8')
return jsonify({
'detections': detections,
'count': len(detections),
'annotated_image': img_base64,
'inference_time_ms': result.speed['inference']
})
@app.route('/health', methods=['GET'])
def health():
return jsonify({'status': 'healthy', 'model': 'yolov8n'})
if __name__ == '__main__':
app.run(host='0.0.0.0', port=5000, debug=True)
# Test API:
# curl -X POST -F "image=@sample_image.jpg" http://localhost:5000/detect
Streamlit Web App (Interactive)
# Save as streamlit_app.py and run: streamlit run streamlit_app.py
import streamlit as st
from ultralytics import YOLO
from PIL import Image
import cv2
import numpy as np
st.title("šÆ YOLOv8 Object Detection")
st.write("Upload an image to detect objects")
# Model selection
model_choice = st.selectbox(
"Select Model",
['yolov8n.pt', 'yolov8s.pt', 'yolov8m.pt']
)
# Confidence threshold
conf_threshold = st.slider("Confidence Threshold", 0.0, 1.0, 0.5, 0.05)
# Upload image
uploaded_file = st.file_uploader("Choose an image", type=['jpg', 'jpeg', 'png'])
if uploaded_file:
# Load image
image = Image.open(uploaded_file)
st.image(image, caption='Uploaded Image', use_column_width=True)
# Detect button
if st.button('Detect Objects'):
with st.spinner('Detecting...'):
# Load model
model = YOLO(model_choice)
# Run detection
results = model(np.array(image), conf=conf_threshold)
result = results[0]
# Display results
annotated = result.plot()
st.image(cv2.cvtColor(annotated, cv2.COLOR_BGR2RGB),
caption='Detection Results',
use_column_width=True)
# Show detections
st.subheader(f"Detected {len(result.boxes)} objects")
for i, box in enumerate(result.boxes, 1):
class_name = result.names[int(box.cls[0])]
confidence = float(box.conf[0])
st.write(f"{i}. **{class_name}**: {confidence:.2%}")
šÆ Project Summary
š Phenomenal Achievement!
You've mastered YOLOv8 object detection - the industry standard for real-time computer vision!
š Key Accomplishments
- ā Deployed pre-trained YOLOv8: Detecting 80 COCO classes in real-time
- ā Custom dataset training: Annotated and trained on your own data
- ā Achieved 30-80 FPS: Real-time detection on modern hardware
- ā Comprehensive evaluation: mAP@50, precision, recall, confusion matrix
- ā Model optimization: Exported to ONNX, TensorRT for deployment
- ā Production API: Flask REST API + Streamlit web app
š Next Level Enhancements
- Instance Segmentation: Use YOLOv8-seg for pixel-level masks
- Pose Estimation: Detect human keypoints with YOLOv8-pose
- Object Tracking: Add ByteTrack or DeepSORT for multi-object tracking
- Edge Deployment: Deploy on Raspberry Pi, Jetson Nano, or smartphones
- Cloud Deployment: AWS Lambda, Google Cloud Functions, Azure
- Active Learning: Continuously improve with user feedback
š¼ Interview Talking Points:
- "Trained YOLOv8 object detector achieving 60+ mAP@50 on custom dataset"
- "Optimized model for real-time inference at 50+ FPS using TensorRT"
- "Deployed production REST API handling 100+ requests/sec"
- "Built end-to-end pipeline: data annotation ā training ā deployment"
- "System powers real-world applications in security/autonomous vehicles"
š Further Learning
- Papers: "YOLOv8: An Incremental Improvement" (Ultralytics), original YOLO paper
- Resources: Ultralytics documentation, Roboflow tutorials
- Datasets: COCO, Open Images, Pascal VOC, custom via Roboflow
- Community: Ultralytics GitHub, YOLO Discord, r/computervision