Batch Object Detection
This guide demonstrates how to set up and run efficient batch object detection workloads using Machine.dev GPU runners. Learn how to process large volumes of images or video frames with state-of-the-art object detection models.
Use Case Overview
Batch object detection allows you to:
- Process large datasets of images in parallel
- Extract and analyze objects, people, vehicles, or other entities
- Generate metadata and annotations for computer vision datasets
- Create analytics from visual content
Prerequisites
- GitHub repository with your object detection code
- Machine.dev account connected to your GitHub repository
- Input images or video frames to process
- Pre-trained object detection model or custom trained model
Workflow Example
This GitHub Actions workflow processes a batch of images using a pre-trained object detection model:
name: Batch Object Detection
on: workflow_dispatch: inputs: model: description: 'Object detection model to use' required: true default: 'yolov8x' type: choice options: - yolov8n - yolov8s - yolov8m - yolov8l - yolov8x - faster_rcnn - retinanet confidence: description: 'Detection confidence threshold' required: true default: '0.25' type: string
jobs: object-detection: name: Run Object Detection runs-on: - machine - gpu=l4 # Good for CV tasks - tenancy=spot # Use spot instances for batch jobs
steps: - name: Checkout code uses: actions/checkout@v3
- name: Setup Python uses: actions/setup-python@v4 with: python-version: '3.10'
- name: Install dependencies run: | pip install -U pip pip install ultralytics opencv-python-headless torch torchvision pip install -r detector/requirements.txt
- name: Download dataset run: | python detector/download_dataset.py \ --dataset="sample_images" \ --output-dir="data"
- name: Run object detection run: | python detector/batch_detect.py \ --model="${{ github.event.inputs.model }}" \ --input-dir="data/images" \ --output-dir="results" \ --confidence=${{ github.event.inputs.confidence }} \ --batch-size=32
- name: Generate report run: | python detector/generate_report.py \ --results-dir="results" \ --output-file="detection_report.html"
- name: Upload results uses: actions/upload-artifact@v3 with: name: detection-results path: | results/ detection_report.htmlExample Implementation Details
Batch Detection Script
Here’s a sample implementation of the batch detection script (detector/batch_detect.py):
import osimport argparseimport jsonimport torchimport cv2import numpy as npfrom pathlib import Pathfrom tqdm import tqdmfrom ultralytics import YOLO
def load_model(model_name): """Load the detection model.""" if model_name.startswith('yolov8'): model = YOLO(f"{model_name}.pt") else: # For other model architectures # This is just a placeholder - you would implement # loading of other model types here raise NotImplementedError(f"Model {model_name} is not supported yet")
return model
def process_images(args): """Process all images in the input directory with object detection.""" # Create output directory os.makedirs(args.output_dir, exist_ok=True)
# Load model model = load_model(args.model)
# Get all image files image_files = [f for f in Path(args.input_dir).glob('*') if f.suffix.lower() in ['.jpg', '.jpeg', '.png', '.bmp']]
if not image_files: print(f"No images found in {args.input_dir}") return
print(f"Found {len(image_files)} images to process")
# Process images in batches results_data = []
# Process in batches for i in range(0, len(image_files), args.batch_size): batch_files = image_files[i:i+args.batch_size]
# Load batch of images batch_images = [cv2.imread(str(f)) for f in batch_files]
# Run detection on batch results = model(batch_images, conf=float(args.confidence))
# Process results for each image for j, result in enumerate(results): image_file = batch_files[j] image_name = image_file.name img = batch_images[j]
# Get detections boxes = result.boxes.cpu().numpy()
# Save image with detections annotated_img = result.plot() output_img_path = os.path.join(args.output_dir, f"detected_{image_name}") cv2.imwrite(output_img_path, annotated_img)
# Extract detection data detections = [] for box in boxes: x1, y1, x2, y2 = box.xyxy[0].astype(int) conf = float(box.conf[0]) cls = int(box.cls[0]) cls_name = result.names[cls]
detections.append({ "bbox": [int(x1), int(y1), int(x2), int(y2)], "confidence": conf, "class": cls_name })
# Store results results_data.append({ "image": image_name, "detections": detections, "detection_count": len(detections), "output_image": f"detected_{image_name}" })
print(f"Processed {image_name}: found {len(detections)} objects")
# Save detection results as JSON with open(os.path.join(args.output_dir, "detections.json"), "w") as f: json.dump(results_data, f, indent=2)
print(f"Completed processing {len(image_files)} images with {args.model}")
if __name__ == "__main__": parser = argparse.ArgumentParser(description="Batch object detection") parser.add_argument("--model", type=str, required=True, help="Detection model to use") parser.add_argument("--input-dir", type=str, required=True, help="Directory with input images") parser.add_argument("--output-dir", type=str, default="results", help="Output directory") parser.add_argument("--confidence", type=float, default=0.25, help="Detection confidence threshold") parser.add_argument("--batch-size", type=int, default=16, help="Batch size for processing") args = parser.parse_args()
process_images(args)Report Generation Script
Here’s a sample implementation of the report generation script (detector/generate_report.py):
import osimport jsonimport argparsefrom pathlib import Pathfrom collections import Counter
def generate_html_report(results_data, output_file): """Generate an HTML report from detection results.""" # Extract summary statistics total_images = len(results_data) total_detections = sum(r["detection_count"] for r in results_data)
# Count objects by class all_classes = [] for result in results_data: for detection in result["detections"]: all_classes.append(detection["class"])
class_counts = Counter(all_classes) top_classes = class_counts.most_common(10)
# Generate HTML html = f""" <!DOCTYPE html> <html> <head> <title>Object Detection Report</title> <style> body {{ font-family: Arial, sans-serif; margin: 40px; }} h1, h2 {{ color: #333; }} .summary {{ background-color: #f5f5f5; padding: 15px; margin: 20px 0; border-radius: 5px; }} .detection-grid {{ display: grid; grid-template-columns: repeat(auto-fill, minmax(300px, 1fr)); gap: 20px; }} .detection-item {{ border: 1px solid #ddd; border-radius: 5px; padding: 10px; }} .detection-image {{ width: 100%; height: auto; border-radius: 3px; }} table {{ border-collapse: collapse; width: 100%; margin: 20px 0; }} th, td {{ padding: 12px; text-align: left; border-bottom: 1px solid #ddd; }} th {{ background-color: #f2f2f2; }} </style> </head> <body> <h1>Object Detection Report</h1>
<div class="summary"> <h2>Summary</h2> <p>Total images processed: {total_images}</p> <p>Total objects detected: {total_detections}</p> <p>Average objects per image: {total_detections / total_images:.2f}</p> </div>
<h2>Top Detected Classes</h2> <table> <tr> <th>Class</th> <th>Count</th> <th>Percentage</th> </tr> """
for cls, count in top_classes: percentage = count / total_detections * 100 html += f""" <tr> <td>{cls}</td> <td>{count}</td> <td>{percentage:.1f}%</td> </tr> """
html += """ </table>
<h2>Sample Detections</h2> <div class="detection-grid"> """
# Add a sample of detection images (first 20) for result in results_data[:20]: image_name = result["image"] output_image = result["output_image"] detection_count = result["detection_count"]
html += f""" <div class="detection-item"> <img class="detection-image" src="results/{output_image}" alt="{image_name}"> <p>{image_name}: {detection_count} objects detected</p> </div> """
html += """ </div> </body> </html> """
with open(output_file, "w") as f: f.write(html)
print(f"Report generated and saved to {output_file}")
def generate_report(args): """Generate detection report.""" # Load results with open(os.path.join(args.results_dir, "detections.json"), "r") as f: results_data = json.load(f)
# Generate HTML report generate_html_report(results_data, args.output_file)
if __name__ == "__main__": parser = argparse.ArgumentParser(description="Generate object detection report") parser.add_argument("--results-dir", type=str, required=True, help="Directory with detection results") parser.add_argument("--output-file", type=str, default="detection_report.html", help="Output HTML file") args = parser.parse_args()
generate_report(args)Hardware Recommendations
For batch object detection workloads with Machine.dev:
| Workload Size | Recommended GPU | Batch Size | Processing Speed |
|---|---|---|---|
| Small batches (< 1,000 images) | T4 (16GB) | 16-32 | ~5-10 images/second |
| Medium batches (1,000-10,000 images) | L4 (24GB) | 32-64 | ~10-20 images/second |
| Large batches (> 10,000 images) | A10G (24GB) | 32-64 | ~15-25 images/second |
| Video processing | L40S (48GB) | Multiple streams | ~30-60 frames/second |
Scaling Strategies
Parallel Processing
For large datasets, you can split processing across multiple jobs:
jobs: split-dataset: runs-on: ubuntu-latest steps: - name: Split dataset run: | python split_dataset.py --chunks=5 outputs: chunks: ${{ steps.split.outputs.chunks }}
process-chunks: needs: split-dataset strategy: matrix: chunk: ${{ fromJson(needs.split-dataset.outputs.chunks) }} runs-on: - machine - gpu=l4 steps: - name: Process chunk run: | python detector/batch_detect.py --chunk=${{ matrix.chunk }}Video Processing
For video files, you can extract frames and process them in parallel:
steps: - name: Extract video frames run: | python detector/extract_frames.py \ --video="input.mp4" \ --output-dir="frames" \ --fps=1 # Extract one frame per second
- name: Process frames run: | python detector/batch_detect.py \ --input-dir="frames" \ --output-dir="detected_frames"
- name: Create output video run: | python detector/create_video.py \ --input-dir="detected_frames" \ --output-video="output.mp4" \ --fps=1Cost Optimization Strategies
To optimize costs for batch object detection:
- Use spot instances for non-time-critical workloads
- Choose the right model size for your accuracy requirements
- Optimize batch size to maximize GPU utilization
- Use incremental processing to handle large datasets in stages
- Implement region selection to use the most cost-effective regions
Advanced Techniques
Model Quantization
Reduce memory usage and improve inference speed:
# Convert to FP16 for faster inferencemodel.half() # For PyTorch models
# Or use ONNX quantizationimport onnxfrom onnxruntime.quantization import quantize_dynamic
quantize_dynamic("model.onnx", "model_quantized.onnx")Custom Post-Processing
Extract specific insights from detection results:
def analyze_detections(detections, classes_of_interest=["person", "car"]): """Analyze detection results for specific insights.""" insights = {cls: 0 for cls in classes_of_interest}
for detection in detections: cls = detection["class"] if cls in classes_of_interest: insights[cls] += 1
return insightsBest Practices
- Pre-process images to standardize size and quality
- Choose confidence thresholds appropriate for your use case
- Implement result filtering to focus on relevant object classes
- Use efficient batch sizes to maximize GPU utilization
- Implement proper error handling for robust processing
Next Steps
- Learn about hyperparameter tuning to optimize your object detection models
- Explore fine-tuning techniques for custom object classes
- Check out model inference best practices for production deployments