Machine supercharges your GitHub Workflows with seamless GPU acceleration. Say goodbye to the tedious overhead of managing GPU runners and hello to streamlined efficiency.

What GPU types are available?

Machine offers a wide selection of NVIDIA GPUs including T4G, T4, L4, A10G, and L40S, as well as AWS Inferentia options to match your specific workflow needs.

How do I integrate Machine with GitHub Actions?

Machine integrates natively with GitHub Actions. Simply update your workflows with a runs-on tag and start accelerating your tasks immediately.

How much faster are GPU-accelerated workflows?

Machine accelerates model training, inference, batch processing, and simulations—up to 100× faster than CPU-only workflows.

GitHub Actions Syntax

This reference guide provides comprehensive information on the GitHub Actions syntax for integrating Machine.dev GPU runners into your workflows.

Basic Workflow Structure

A GitHub Actions workflow is defined in YAML format and must include the following elements to use Machine.dev:

name: My GPU Workflow

on:
  # Trigger events (push, pull_request, workflow_dispatch, etc.)
  workflow_dispatch:

jobs:
  gpu-job:
    name: GPU-Accelerated Job
    runs-on:
      - machine       # Required to use Machine.dev
      - gpu=a10g      # Specify GPU type
      - tenancy=spot  # Specify tenancy type (spot or dedicated)

    steps:
      # Job steps
      - name: Checkout code
        uses: actions/checkout@v3

      # Add more steps...

Machine.dev Runner Labels

Required Label

runs-on:
  - machine  # Required for all Machine.dev runners

GPU Type Labels

Specify the type of GPU you want to use:

runs-on:
  - machine
  - gpu=<gpu-type>

Available GPU types:

Label	GPU Model	VRAM	CUDA Cores	Use Cases
`gpu=t4`	NVIDIA T4	16 GB	2,560	Training, inference, computer vision
`gpu=l4`	NVIDIA L4	24 GB	7,424	ML/DL training, inference, vision AI
`gpu=a10g`	NVIDIA A10G	24 GB	9,216	Model training, rendering, simulation
`gpu=a100`	NVIDIA A100	40 GB	6,912	Large model training, HPC
`gpu=a100-80gb`	NVIDIA A100	80 GB	6,912	Very large models, distributed training
`gpu=l40s`	NVIDIA L40S	48 GB	18,176	Generative AI, large models, computer vision

Tenancy Type Labels

Specify whether you want dedicated or spot instances:

runs-on:
  - machine
  - gpu=a10g
  - tenancy=<tenancy-type>

Available tenancy types:

Label	Description	Cost	Stability
`tenancy=dedicated`	Dedicated instances with guaranteed availability	Higher cost	Highest stability
`tenancy=spot`	Spot instances that may be preempted	Up to 70% lower cost	May be interrupted

Region Labels

Optionally specify the region where your runner should be provisioned:

runs-on:
  - machine
  - gpu=a10g
  - region=<region-code>

Available regions:

Label	Location	Latency (US)	Available GPUs
`region=us-east`	US East (Virginia)	Low	All types
`region=us-west`	US West (Oregon)	Medium	All types
`region=eu-west`	EU West (Ireland)	High	T4, L4, A10G
`region=ap-southeast`	Asia Pacific (Singapore)	High	T4, L4

Complete Example

Here’s a complete example that specifies all available options:

name: Complete Machine.dev Example

on:
  workflow_dispatch:
    inputs:
      model_type:
        description: 'Type of model to train'
        required: true
        default: 'small'
        type: choice
        options:
          - small
          - medium
          - large

jobs:
  train-model:
    name: Train Machine Learning Model
    runs-on:
      - machine
      - gpu=a10g
      - tenancy=spot
      - region=us-east

    steps:
      - name: Checkout code
        uses: actions/checkout@v3

      - name: Setup Python
        uses: actions/setup-python@v4
        with:
          python-version: '3.10'

      - name: Install dependencies
        run: |
          pip install -r requirements.txt

      - name: Train model
        run: |
          python train.py --model-size=${{ github.event.inputs.model_type }}

      - name: Upload model artifacts
        uses: actions/upload-artifact@v3
        with:
          name: trained-model
          path: model/

Matrix Strategy for Multiple GPUs

You can use matrix strategy to run your workflow on multiple GPU types:

jobs:
  gpu-matrix:
    strategy:
      matrix:
        gpu: [t4, l4, a10g]
      fail-fast: false

    runs-on:
      - machine
      - gpu=${{ matrix.gpu }}

    steps:
      - name: Run benchmark on ${{ matrix.gpu }}
        run: |
          python benchmark.py --gpu-type=${{ matrix.gpu }}

Job Dependencies

You can coordinate between CPU and GPU jobs:

jobs:
  prepare-data:
    runs-on: ubuntu-latest
    steps:
      - name: Prepare dataset
        run: python prepare_data.py
    outputs:
      dataset_path: ${{ steps.prepare.outputs.path }}

  train-model:
    needs: prepare-data
    runs-on:
      - machine
      - gpu=a10g
    steps:
      - name: Train on prepared data
        run: |
          python train.py --data=${{ needs.prepare-data.outputs.dataset_path }}

Environment Variables

Machine.dev provides several environment variables that can be accessed in your workflow:

Variable	Description	Example Value
`MACHINE_GPU_TYPE`	Type of GPU	`a10g`
`MACHINE_GPU_COUNT`	Number of GPUs	`1`
`MACHINE_TENANCY`	Tenancy type	`spot`
`MACHINE_REGION`	Region code	`us-east`
`CUDA_VISIBLE_DEVICES`	CUDA devices available	`0`

Example usage:

steps:
  - name: Check environment
    run: |
      echo "GPU Type: $MACHINE_GPU_TYPE"
      echo "GPU Count: $MACHINE_GPU_COUNT"
      echo "Tenancy: $MACHINE_TENANCY"
      echo "Region: $MACHINE_REGION"

Error Handling for Spot Instances

When using spot instances, you should implement error handling to manage potential interruptions:

jobs:
  spot-job:
    runs-on:
      - machine
      - gpu=a10g
      - tenancy=spot

    steps:
      # Save checkpoints frequently
      - name: Train with checkpoints
        run: |
          python train.py --checkpoint-freq=5

      # Upload partial results
      - name: Upload checkpoints
        if: always()  # Run even if job is cancelled
        uses: actions/upload-artifact@v3
        with:
          name: model-checkpoints
          path: checkpoints/

Best Practices

Set Appropriate Timeouts

Set appropriate timeouts for your jobs:

jobs:
  gpu-job:
    timeout-minutes: 60  # Limit to 1 hour

Use Conditional Execution

Use conditional execution to skip GPU tasks when appropriate:

steps:
  - name: GPU-intensive task
    if: github.event_name == 'push' && github.ref == 'refs/heads/main'
    run: python heavy_task.py

Optimize Workflow Triggers

Optimize workflow triggers to avoid unnecessary runs:

on:
  push:
    paths:
      - 'model/**'
      - 'data/**'
    branches:
      - main

Use Workflow Concurrency

Use workflow concurrency to avoid redundant runs:

concurrency:
  group: ${{ github.workflow }}-${{ github.ref }}
  cancel-in-progress: true

Common Issues and Solutions

Issue	Solution
Runner not available	Try a different GPU type or region
Out of memory error	Use a GPU with more VRAM or optimize batch size
Slow performance	Check data loading bottlenecks or use a faster GPU
Spot instance terminated	Use dedicated instances for critical workloads
Long provisioning time	Pre-warm runners with scheduled workflows