Skip to content

GitHub Actions Syntax

This reference guide provides comprehensive information on the GitHub Actions syntax for integrating Machine.dev GPU runners into your workflows.

Basic Workflow Structure

A GitHub Actions workflow is defined in YAML format and must include the following elements to use Machine.dev:

name: My GPU Workflow
on:
# Trigger events (push, pull_request, workflow_dispatch, etc.)
workflow_dispatch:
jobs:
gpu-job:
name: GPU-Accelerated Job
runs-on:
- machine # Required to use Machine.dev
- gpu=a10g # Specify GPU type
- tenancy=spot # Specify tenancy type (spot or on_demand)
steps:
# Job steps
- name: Checkout code
uses: actions/checkout@v3
# Add more steps...

Machine.dev Runner Labels

Required Label

runs-on:
- machine # Required for all Machine.dev runners

Runner Type Labels

CPU Runners

For high-performance CPU runners (must specify the number of vCPUs):

runs-on:
- machine
- cpu=16 # Required: specify number of vCPUs (2, 4, 8, 16, 32, 48, or 64)

CPU Runner Specifications:

  • Available configurations: 2, 4, 8, 16, 32, 48, or 64 vCPUs
  • RAM scales with vCPUs (e.g., 16 vCPUs = 32GB RAM, 64 vCPUs = 128GB RAM)
  • X64 or ARM64 architecture options
  • Ideal for builds, testing, and data processing

GPU Type Labels

Specify the type of GPU you want to use:

runs-on:
- machine
- gpu=<gpu-type>

Available GPU types:

LabelGPU ModelVRAMCUDA CoresArchitectureUse Cases
gpu=t4gNVIDIA T4 (Graviton)16 GB2,560ARM64Entry-level ML, inference
gpu=t4NVIDIA T416 GB2,560X64Training, inference, computer vision
gpu=l4NVIDIA L424 GB7,680X64ML/DL training, inference, vision AI
gpu=a10gNVIDIA A10G24 GB9,216X64Model training, rendering, simulation
gpu=l40sNVIDIA L40S48 GB18,176X64Generative AI, large models, computer vision

AWS AI Accelerators:

LabelAcceleratorMemoryArchitectureUse Cases
gpu=trainiumAWS Trainium32 GBX64High-performance training
gpu=inferentia2AWS Inferentia232 GBX64Optimized inference

Note: When using gpu=t4g, architecture is automatically set to arm64 regardless of any architecture label.

Tenancy Type Labels

Specify whether you want dedicated or spot instances:

runs-on:
- machine
- gpu=a10g
- tenancy=<tenancy-type>

Available tenancy types:

LabelDescriptionCostStability
tenancy=on_demandOn-demand instances with guaranteed availabilityHigher costHighest stability
tenancy=spotSpot instances that may be preemptedUp to 85% lower costMay be interrupted

Region Labels

Optionally specify one or more AWS regions where your runner should be provisioned. Use a comma-separated list of full region codes:

runs-on:
- machine
- gpu=a10g
- regions=us-east-1,us-west-2

Available regions:

Region CodeLocation
us-east-1US East (N. Virginia)
us-east-2US East (Ohio)
us-west-2US West (Oregon)
ca-central-1Canada (Central)
eu-south-2Europe (Spain)
ap-southeast-2Asia Pacific (Sydney)

If no region is specified, Machine searches globally across all enabled regions to find the most cost-effective option.

Storage Labels

Configure the EBS gp3 root volume for your runner:

runs-on:
- machine
- gpu=a10g
- disk_size=500 # Volume size in GB (default: 100, max: 16384)
- disk_iops=10000 # IOPS (default: 6000, range: 6000-16000)
- disk_throughput=750 # Throughput in MB/s (default: 250, range: 250-1000)
LabelDescriptionDefaultRange
disk_size=<GB>Root volume size100 GB1 — 16,384 GB
disk_iops=<IOPS>Provisioned IOPS6,0006,000 — 16,000
disk_throughput=<MB/s>Provisioned throughput250 MB/s250 — 1,000 MB/s

Note: Increasing IOPS and throughput above the defaults incurs additional EBS charges. See Pricing for details.

Metrics Labels

Control CloudWatch metrics collection for your runner:

runs-on:
- machine
- gpu=a10g
- metrics=true # Enable/disable metrics (default: true)
- metrics_interval=10 # Collection interval in seconds (default: 60, range: 1-60)
LabelDescriptionDefaultValid Values
metrics=<bool>Enable metrics collectiontruetrue, false
metrics_interval=<seconds>Collection interval601 — 60

When enabled, metrics are collected for CPU, memory, disk, network, and GPU utilization. Results appear as sparkline charts on the Machine dashboard after job completion.

Complete Examples

CPU Runner Example

name: CPU Build Example
on:
push:
branches: [main]
jobs:
build:
name: Build Application
runs-on:
- machine
- cpu=16 # Required: specify number of vCPUs
- tenancy=spot
- regions=us-east-1,us-west-2
steps:
- name: Checkout code
uses: actions/checkout@v3
- name: Build project
run: make -j$(nproc) all

GPU Runner Example

name: GPU Training Example
on:
workflow_dispatch:
inputs:
model_type:
description: 'Type of model to train'
required: true
default: 'small'
type: choice
options:
- small
- medium
- large
jobs:
train-model:
name: Train Machine Learning Model
runs-on:
- machine
- gpu=a10g
- tenancy=spot
- regions=us-east-1
steps:
- name: Checkout code
uses: actions/checkout@v3
- name: Setup Python
uses: actions/setup-python@v4
with:
python-version: '3.10'
- name: Install dependencies
run: |
pip install -r requirements.txt
- name: Train model
run: |
python train.py --model-size=${{ github.event.inputs.model_type }}
- name: Upload model artifacts
uses: actions/upload-artifact@v3
with:
name: trained-model
path: model/

Matrix Strategy for Multiple GPUs

You can use matrix strategy to run your workflow on multiple GPU types:

jobs:
gpu-matrix:
strategy:
matrix:
gpu: [t4, l4, a10g]
fail-fast: false
runs-on:
- machine
- gpu=${{ matrix.gpu }}
steps:
- name: Run benchmark on ${{ matrix.gpu }}
run: |
python benchmark.py --gpu-type=${{ matrix.gpu }}

Job Dependencies

You can coordinate between CPU and GPU jobs:

jobs:
build:
runs-on:
- machine
- cpu=8 # Build with 8 vCPUs
steps:
- name: Build application
run: make build
outputs:
artifact_path: ${{ steps.build.outputs.path }}
train-model:
needs: build
runs-on:
- machine
- gpu=a10g # Train with GPU
steps:
- name: Train on prepared data
run: |
python train.py --artifact=${{ needs.build.outputs.artifact_path }}

Environment Variables

Machine.dev provides several environment variables that can be accessed in your workflow:

VariableDescriptionExample Value
MACHINE_GPU_TYPEType of GPUa10g
MACHINE_GPU_COUNTNumber of GPUs1
MACHINE_TENANCYTenancy typespot
MACHINE_REGIONRegion codeus-east-1
CUDA_VISIBLE_DEVICESCUDA devices available0

Example usage:

steps:
- name: Check environment
run: |
echo "GPU Type: $MACHINE_GPU_TYPE"
echo "GPU Count: $MACHINE_GPU_COUNT"
echo "Tenancy: $MACHINE_TENANCY"
echo "Region: $MACHINE_REGION"

Error Handling for Spot Instances

When using spot instances, you should implement error handling to manage potential interruptions:

jobs:
spot-job:
runs-on:
- machine
- gpu=a10g
- tenancy=spot
steps:
# Save checkpoints frequently
- name: Train with checkpoints
run: |
python train.py --checkpoint-freq=5
# Upload partial results
- name: Upload checkpoints
if: always() # Run even if job is cancelled
uses: actions/upload-artifact@v3
with:
name: model-checkpoints
path: checkpoints/

Best Practices

Set Appropriate Timeouts

  1. Set appropriate timeouts for your jobs:
    jobs:
    gpu-job:
    timeout-minutes: 60 # Limit to 1 hour

Use Conditional Execution

  1. Use conditional execution to skip GPU tasks when appropriate:
    steps:
    - name: GPU-intensive task
    if: github.event_name == 'push' && github.ref == 'refs/heads/main'
    run: python heavy_task.py

Optimize Workflow Triggers

  1. Optimize workflow triggers to avoid unnecessary runs:
    on:
    push:
    paths:
    - 'model/**'
    - 'data/**'
    branches:
    - main

Use Workflow Concurrency

  1. Use workflow concurrency to avoid redundant runs:
    concurrency:
    group: ${{ github.workflow }}-${{ github.ref }}
    cancel-in-progress: true

Common Issues and Solutions

IssueSolution
Runner not availableTry a different GPU type or region
Out of memory errorUse a GPU with more VRAM or optimize batch size
Slow performanceCheck data loading bottlenecks or use a faster GPU
Spot instance terminatedUse dedicated instances for critical workloads
Long provisioning timePre-warm runners with scheduled workflows