Machine supercharges your GitHub Workflows with seamless GPU acceleration. Say goodbye to the tedious overhead of managing GPU runners and hello to streamlined efficiency.

What GPU types are available?

Machine offers a wide selection of NVIDIA GPUs including T4G, T4, L4, A10G, and L40S, as well as AWS Inferentia options to match your specific workflow needs.

How do I integrate Machine with GitHub Actions?

Machine integrates natively with GitHub Actions. Simply update your workflows with a runs-on tag and start accelerating your tasks immediately.

How much faster are GPU-accelerated workflows?

Machine accelerates model training, inference, batch processing, and simulations—up to 100× faster than CPU-only workflows.

Parallel Hyperparameter Tuning

The Parallel Hyperparameter Tuning workflow allows you to systematically explore combinations of key training parameters to identify the optimal configuration for your machine learning models. This implementation leverages Machine GPU runners to run multiple training iterations concurrently, significantly reducing the time needed to find the best model configuration.

Use Case Overview

Why might you want to use parallel hyperparameter tuning?

Find optimal model configurations more efficiently by testing multiple parameter sets simultaneously
Reduce the total time needed for hyperparameter search
Systematically compare model performance across different configurations
Automate the process of identifying the best-performing models

How It Works

The Parallel Hyperparameter Tuning workflow uses GitHub Actions’ matrix strategy to run multiple training jobs concurrently. Each job trains a ResNet model on the CIFAR-10 dataset with a different combination of hyperparameters. The workflow is defined in GitHub Actions and can be triggered on-demand.

The tuning process:

Defines a matrix of hyperparameter combinations to explore
Launches multiple GPU-powered training jobs concurrently, one for each combination
Saves performance metrics from each training run as artifacts
Aggregates and compares results across all runs
Generates a comprehensive comparison report

Workflow Implementation

The Parallel Hyperparameter Tuning is implemented as a GitHub Actions workflow that runs multiple jobs in parallel. Here’s the workflow definition:

name: ResNet Hyperparameter Tuning

on:
  workflow_dispatch:

jobs:
  hyperparameter_tuning:
    name: Hyperparameter Tuning
    runs-on:
      - machine
      - gpu=T4
      - cpu=4
      - ram=16
      - architecture=x64
    timeout-minutes: 30
    strategy:
      fail-fast: false
      matrix:
        learning_rate: [0.001, 0.0005]
        batch_size: [32, 64]
    steps:
      - uses: actions/checkout@v3

      - name: Set up Python 3.10
        uses: actions/setup-python@v4
        with:
          python-version: '3.10'

      - name: Install uv
        uses: astral-sh/setup-uv@v5

      - name: Install dependencies
        run: |
          uv venv .venv --python=3.10
          source .venv/bin/activate
          uv pip install -r requirements.txt
          deactivate

      - name: Train and Evaluate ResNet
        env:
          LEARNING_RATE: ${{ matrix.learning_rate }}
          BATCH_SIZE: ${{ matrix.batch_size }}
        run: |
          source .venv/bin/activate
          python train.py
          deactivate

      - name: Upload metrics artifact
        uses: actions/upload-artifact@v4
        with:
          name: metrics-${{ matrix.learning_rate }}-${{ matrix.batch_size }}
          path: metrics_*.json

  compare_tuning:
    needs: hyperparameter_tuning
    name: Compare Tuning Performance
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v3

      - name: Set up Python 3.10
        uses: actions/setup-python@v4
        with:
          python-version: '3.10'

      - name: Install uv
        uses: astral-sh/setup-uv@v5

      - name: Install dependencies
        run: |
          uv venv .venv --python=3.10
          source .venv/bin/activate
          uv pip install -r requirements.txt
          deactivate

      - name: Download all metrics
        uses: actions/download-artifact@v4
        with:
          path: metrics

      - name: Compare Metrics
        run: |
          source .venv/bin/activate
          python compare_metrics.py
          deactivate

      - name: Upload comparison results
        uses: actions/upload-artifact@v4
        with:
          name: comparison-results
          path: model_comparison.csv

Key Features

The power of this implementation comes from several key features:

Matrix Strategy: The workflow defines a matrix of hyperparameters, automatically creating separate jobs for each combination. In this example, we’re exploring two learning rates (0.001, 0.0005) and two batch sizes (32, 64), resulting in 4 concurrent training jobs.
Parallel Execution: Each hyperparameter combination runs as a separate job on its own GPU runner, allowing multiple experiments to run simultaneously rather than sequentially.
Metrics Collection: Each training job produces performance metrics that are saved as artifacts with names that indicate the hyperparameter values used.
Automated Comparison: After all training jobs complete, a separate job downloads all metrics and generates a comparison report, making it easy to identify the best configuration.

Using Machine GPU Runners

This hyperparameter tuning process leverages Machine GPU runners to provide the necessary computing power for efficient model training. The workflow is configured to use:

T4 GPU: An entry-level ML GPU with 16GB VRAM, well-suited for training moderate-sized models
Configurable resources: CPU, RAM, and architecture specifications optimized for each training job

The parallel nature of this approach means that you can complete a hyperparameter search in a fraction of the time it would take to run sequentially, even when using the same hardware resources per job.

Best Practices

Choose parameters wisely: Select hyperparameters that have the most impact on model performance
Start with a broad search: Begin with a wide range of values, then refine with narrower ranges around promising values
Consider resource allocation: Adjust CPU/RAM requirements based on your specific model and dataset needs
Set appropriate timeouts: Ensure your workflow timeout is sufficient for all jobs to complete
Use fail-fast: false: This ensures all combinations are evaluated even if some fail, giving you a complete picture

Getting Started

To run the Parallel Hyperparameter Tuning workflow:

Use the MachineDotDev/parallel-hyperparameter-tuning repository as a template
Navigate to the Actions tab in your repository
Select the “ResNet Hyperparameter Tuning” workflow
Click “Run workflow” to start the tuning process
Wait for all jobs to complete
Download the comparison-results artifact to identify the best hyperparameter configuration

Customizing the Workflow

You can easily adapt this workflow for your own models and hyperparameters:

Modify the matrix in the workflow file to include your specific hyperparameters
Update the training script (train.py) to work with your model and dataset
Adjust the metrics collection to capture the performance indicators most relevant to your task
Customize the comparison script (compare_metrics.py) to generate insights tailored to your needs

Next Steps

Explore the full MachineDotDev/parallel-hyperparameter-tuning repository
Learn about GPU runner specifications