Skip to content

Batch Object Detection

This guide demonstrates how to set up and run efficient batch object detection workloads using Machine.dev GPU runners. Learn how to process large volumes of images or video frames with state-of-the-art object detection models.

Use Case Overview

Batch object detection allows you to:

  • Process large datasets of images in parallel
  • Extract and analyze objects, people, vehicles, or other entities
  • Generate metadata and annotations for computer vision datasets
  • Create analytics from visual content

Prerequisites

  • GitHub repository with your object detection code
  • Machine.dev account connected to your GitHub repository
  • Input images or video frames to process
  • Pre-trained object detection model or custom trained model

Workflow Example

This GitHub Actions workflow processes a batch of images using a pre-trained object detection model:

name: Batch Object Detection
on:
workflow_dispatch:
inputs:
model:
description: 'Object detection model to use'
required: true
default: 'yolov8x'
type: choice
options:
- yolov8n
- yolov8s
- yolov8m
- yolov8l
- yolov8x
- faster_rcnn
- retinanet
confidence:
description: 'Detection confidence threshold'
required: true
default: '0.25'
type: string
jobs:
object-detection:
name: Run Object Detection
runs-on:
- machine
- gpu=l4 # Good for CV tasks
- tenancy=spot # Use spot instances for batch jobs
steps:
- name: Checkout code
uses: actions/checkout@v3
- name: Setup Python
uses: actions/setup-python@v4
with:
python-version: '3.10'
- name: Install dependencies
run: |
pip install -U pip
pip install ultralytics opencv-python-headless torch torchvision
pip install -r detector/requirements.txt
- name: Download dataset
run: |
python detector/download_dataset.py \
--dataset="sample_images" \
--output-dir="data"
- name: Run object detection
run: |
python detector/batch_detect.py \
--model="${{ github.event.inputs.model }}" \
--input-dir="data/images" \
--output-dir="results" \
--confidence=${{ github.event.inputs.confidence }} \
--batch-size=32
- name: Generate report
run: |
python detector/generate_report.py \
--results-dir="results" \
--output-file="detection_report.html"
- name: Upload results
uses: actions/upload-artifact@v3
with:
name: detection-results
path: |
results/
detection_report.html

Example Implementation Details

Batch Detection Script

Here’s a sample implementation of the batch detection script (detector/batch_detect.py):

import os
import argparse
import json
import torch
import cv2
import numpy as np
from pathlib import Path
from tqdm import tqdm
from ultralytics import YOLO
def load_model(model_name):
"""Load the detection model."""
if model_name.startswith('yolov8'):
model = YOLO(f"{model_name}.pt")
else:
# For other model architectures
# This is just a placeholder - you would implement
# loading of other model types here
raise NotImplementedError(f"Model {model_name} is not supported yet")
return model
def process_images(args):
"""Process all images in the input directory with object detection."""
# Create output directory
os.makedirs(args.output_dir, exist_ok=True)
# Load model
model = load_model(args.model)
# Get all image files
image_files = [f for f in Path(args.input_dir).glob('*')
if f.suffix.lower() in ['.jpg', '.jpeg', '.png', '.bmp']]
if not image_files:
print(f"No images found in {args.input_dir}")
return
print(f"Found {len(image_files)} images to process")
# Process images in batches
results_data = []
# Process in batches
for i in range(0, len(image_files), args.batch_size):
batch_files = image_files[i:i+args.batch_size]
# Load batch of images
batch_images = [cv2.imread(str(f)) for f in batch_files]
# Run detection on batch
results = model(batch_images, conf=float(args.confidence))
# Process results for each image
for j, result in enumerate(results):
image_file = batch_files[j]
image_name = image_file.name
img = batch_images[j]
# Get detections
boxes = result.boxes.cpu().numpy()
# Save image with detections
annotated_img = result.plot()
output_img_path = os.path.join(args.output_dir, f"detected_{image_name}")
cv2.imwrite(output_img_path, annotated_img)
# Extract detection data
detections = []
for box in boxes:
x1, y1, x2, y2 = box.xyxy[0].astype(int)
conf = float(box.conf[0])
cls = int(box.cls[0])
cls_name = result.names[cls]
detections.append({
"bbox": [int(x1), int(y1), int(x2), int(y2)],
"confidence": conf,
"class": cls_name
})
# Store results
results_data.append({
"image": image_name,
"detections": detections,
"detection_count": len(detections),
"output_image": f"detected_{image_name}"
})
print(f"Processed {image_name}: found {len(detections)} objects")
# Save detection results as JSON
with open(os.path.join(args.output_dir, "detections.json"), "w") as f:
json.dump(results_data, f, indent=2)
print(f"Completed processing {len(image_files)} images with {args.model}")
if __name__ == "__main__":
parser = argparse.ArgumentParser(description="Batch object detection")
parser.add_argument("--model", type=str, required=True, help="Detection model to use")
parser.add_argument("--input-dir", type=str, required=True, help="Directory with input images")
parser.add_argument("--output-dir", type=str, default="results", help="Output directory")
parser.add_argument("--confidence", type=float, default=0.25, help="Detection confidence threshold")
parser.add_argument("--batch-size", type=int, default=16, help="Batch size for processing")
args = parser.parse_args()
process_images(args)

Report Generation Script

Here’s a sample implementation of the report generation script (detector/generate_report.py):

import os
import json
import argparse
from pathlib import Path
from collections import Counter
def generate_html_report(results_data, output_file):
"""Generate an HTML report from detection results."""
# Extract summary statistics
total_images = len(results_data)
total_detections = sum(r["detection_count"] for r in results_data)
# Count objects by class
all_classes = []
for result in results_data:
for detection in result["detections"]:
all_classes.append(detection["class"])
class_counts = Counter(all_classes)
top_classes = class_counts.most_common(10)
# Generate HTML
html = f"""
<!DOCTYPE html>
<html>
<head>
<title>Object Detection Report</title>
<style>
body {{ font-family: Arial, sans-serif; margin: 40px; }}
h1, h2 {{ color: #333; }}
.summary {{ background-color: #f5f5f5; padding: 15px; margin: 20px 0; border-radius: 5px; }}
.detection-grid {{ display: grid; grid-template-columns: repeat(auto-fill, minmax(300px, 1fr)); gap: 20px; }}
.detection-item {{ border: 1px solid #ddd; border-radius: 5px; padding: 10px; }}
.detection-image {{ width: 100%; height: auto; border-radius: 3px; }}
table {{ border-collapse: collapse; width: 100%; margin: 20px 0; }}
th, td {{ padding: 12px; text-align: left; border-bottom: 1px solid #ddd; }}
th {{ background-color: #f2f2f2; }}
</style>
</head>
<body>
<h1>Object Detection Report</h1>
<div class="summary">
<h2>Summary</h2>
<p>Total images processed: {total_images}</p>
<p>Total objects detected: {total_detections}</p>
<p>Average objects per image: {total_detections / total_images:.2f}</p>
</div>
<h2>Top Detected Classes</h2>
<table>
<tr>
<th>Class</th>
<th>Count</th>
<th>Percentage</th>
</tr>
"""
for cls, count in top_classes:
percentage = count / total_detections * 100
html += f"""
<tr>
<td>{cls}</td>
<td>{count}</td>
<td>{percentage:.1f}%</td>
</tr>
"""
html += """
</table>
<h2>Sample Detections</h2>
<div class="detection-grid">
"""
# Add a sample of detection images (first 20)
for result in results_data[:20]:
image_name = result["image"]
output_image = result["output_image"]
detection_count = result["detection_count"]
html += f"""
<div class="detection-item">
<img class="detection-image" src="results/{output_image}" alt="{image_name}">
<p>{image_name}: {detection_count} objects detected</p>
</div>
"""
html += """
</div>
</body>
</html>
"""
with open(output_file, "w") as f:
f.write(html)
print(f"Report generated and saved to {output_file}")
def generate_report(args):
"""Generate detection report."""
# Load results
with open(os.path.join(args.results_dir, "detections.json"), "r") as f:
results_data = json.load(f)
# Generate HTML report
generate_html_report(results_data, args.output_file)
if __name__ == "__main__":
parser = argparse.ArgumentParser(description="Generate object detection report")
parser.add_argument("--results-dir", type=str, required=True, help="Directory with detection results")
parser.add_argument("--output-file", type=str, default="detection_report.html", help="Output HTML file")
args = parser.parse_args()
generate_report(args)

Hardware Recommendations

For batch object detection workloads with Machine.dev:

Workload SizeRecommended GPUBatch SizeProcessing Speed
Small batches (< 1,000 images)T4 (16GB)16-32~5-10 images/second
Medium batches (1,000-10,000 images)L4 (24GB)32-64~10-20 images/second
Large batches (> 10,000 images)A10G (24GB)32-64~15-25 images/second
Video processingL40S (48GB)Multiple streams~30-60 frames/second

Scaling Strategies

Parallel Processing

For large datasets, you can split processing across multiple jobs:

jobs:
split-dataset:
runs-on: ubuntu-latest
steps:
- name: Split dataset
run: |
python split_dataset.py --chunks=5
outputs:
chunks: ${{ steps.split.outputs.chunks }}
process-chunks:
needs: split-dataset
strategy:
matrix:
chunk: ${{ fromJson(needs.split-dataset.outputs.chunks) }}
runs-on:
- machine
- gpu=l4
steps:
- name: Process chunk
run: |
python detector/batch_detect.py --chunk=${{ matrix.chunk }}

Video Processing

For video files, you can extract frames and process them in parallel:

steps:
- name: Extract video frames
run: |
python detector/extract_frames.py \
--video="input.mp4" \
--output-dir="frames" \
--fps=1 # Extract one frame per second
- name: Process frames
run: |
python detector/batch_detect.py \
--input-dir="frames" \
--output-dir="detected_frames"
- name: Create output video
run: |
python detector/create_video.py \
--input-dir="detected_frames" \
--output-video="output.mp4" \
--fps=1

Cost Optimization Strategies

To optimize costs for batch object detection:

  1. Use spot instances for non-time-critical workloads
  2. Choose the right model size for your accuracy requirements
  3. Optimize batch size to maximize GPU utilization
  4. Use incremental processing to handle large datasets in stages
  5. Implement region selection to use the most cost-effective regions

Advanced Techniques

Model Quantization

Reduce memory usage and improve inference speed:

# Convert to FP16 for faster inference
model.half() # For PyTorch models
# Or use ONNX quantization
import onnx
from onnxruntime.quantization import quantize_dynamic
quantize_dynamic("model.onnx", "model_quantized.onnx")

Custom Post-Processing

Extract specific insights from detection results:

def analyze_detections(detections, classes_of_interest=["person", "car"]):
"""Analyze detection results for specific insights."""
insights = {cls: 0 for cls in classes_of_interest}
for detection in detections:
cls = detection["class"]
if cls in classes_of_interest:
insights[cls] += 1
return insights

Best Practices

  1. Pre-process images to standardize size and quality
  2. Choose confidence thresholds appropriate for your use case
  3. Implement result filtering to focus on relevant object classes
  4. Use efficient batch sizes to maximize GPU utilization
  5. Implement proper error handling for robust processing

Next Steps