Skip to content

GitHub Actions Syntax

This reference guide provides comprehensive information on the GitHub Actions syntax for integrating Machine.dev GPU runners into your workflows.

Basic Workflow Structure

A GitHub Actions workflow is defined in YAML format and must include the following elements to use Machine.dev:

name: My GPU Workflow
on:
# Trigger events (push, pull_request, workflow_dispatch, etc.)
workflow_dispatch:
jobs:
gpu-job:
name: GPU-Accelerated Job
runs-on:
- machine # Required to use Machine.dev
- gpu=a10g # Specify GPU type
- tenancy=spot # Specify tenancy type (spot or dedicated)
steps:
# Job steps
- name: Checkout code
uses: actions/checkout@v3
# Add more steps...

Machine.dev Runner Labels

Required Label

runs-on:
- machine # Required for all Machine.dev runners

Runner Type Labels

CPU Runners

For high-performance CPU runners (must specify the number of vCPUs):

runs-on:
- machine
- cpu=16 # Required: specify number of vCPUs (2, 4, 8, 16, 32, or 64)

CPU Runner Specifications:

  • Available configurations: 2, 4, 8, 16, 32, or 64 vCPUs
  • RAM scales with vCPUs (e.g., 16 vCPUs = 32GB RAM, 64 vCPUs = 128GB RAM)
  • X64 or ARM64 architecture options
  • Ideal for builds, testing, and data processing

GPU Type Labels

Specify the type of GPU you want to use:

runs-on:
- machine
- gpu=<gpu-type>

Available GPU types:

LabelGPU ModelVRAMCUDA CoresUse Cases
gpu=t4NVIDIA T416 GB2,560Training, inference, computer vision
gpu=l4NVIDIA L424 GB7,424ML/DL training, inference, vision AI
gpu=a10gNVIDIA A10G24 GB9,216Model training, rendering, simulation
gpu=l40sNVIDIA L40S48 GB18,176Generative AI, large models, computer vision

Tenancy Type Labels

Specify whether you want dedicated or spot instances:

runs-on:
- machine
- gpu=a10g
- tenancy=<tenancy-type>

Available tenancy types:

LabelDescriptionCostStability
tenancy=on_demandOn-demand instances with guaranteed availabilityHigher costHighest stability
tenancy=spotSpot instances that may be preemptedUp to 85% lower costMay be interrupted

Region Labels

Optionally specify the region where your runner should be provisioned:

runs-on:
- machine
- gpu=a10g
- region=<region-code>

Available regions:

LabelLocationLatency (US)Available GPUs
region=us-eastUS East (Virginia)LowAll types
region=us-westUS West (Oregon)MediumAll types
region=eu-westEU West (Ireland)HighT4, L4, A10G
region=ap-southeastAsia Pacific (Singapore)HighT4, L4

Complete Examples

CPU Runner Example

name: CPU Build Example
on:
push:
branches: [main]
jobs:
build:
name: Build Application
runs-on:
- machine
- cpu=16 # Required: specify number of vCPUs
- tenancy=spot
- regions=us-east-1,us-west-2
steps:
- name: Checkout code
uses: actions/checkout@v3
- name: Build project
run: make -j$(nproc) all

GPU Runner Example

name: GPU Training Example
on:
workflow_dispatch:
inputs:
model_type:
description: 'Type of model to train'
required: true
default: 'small'
type: choice
options:
- small
- medium
- large
jobs:
train-model:
name: Train Machine Learning Model
runs-on:
- machine
- gpu=a10g
- tenancy=spot
- regions=us-east-1
steps:
- name: Checkout code
uses: actions/checkout@v3
- name: Setup Python
uses: actions/setup-python@v4
with:
python-version: '3.10'
- name: Install dependencies
run: |
pip install -r requirements.txt
- name: Train model
run: |
python train.py --model-size=${{ github.event.inputs.model_type }}
- name: Upload model artifacts
uses: actions/upload-artifact@v3
with:
name: trained-model
path: model/

Matrix Strategy for Multiple GPUs

You can use matrix strategy to run your workflow on multiple GPU types:

jobs:
gpu-matrix:
strategy:
matrix:
gpu: [t4, l4, a10g]
fail-fast: false
runs-on:
- machine
- gpu=${{ matrix.gpu }}
steps:
- name: Run benchmark on ${{ matrix.gpu }}
run: |
python benchmark.py --gpu-type=${{ matrix.gpu }}

Job Dependencies

You can coordinate between CPU and GPU jobs:

jobs:
build:
runs-on:
- machine
- cpu=8 # Build with 8 vCPUs
steps:
- name: Build application
run: make build
outputs:
artifact_path: ${{ steps.build.outputs.path }}
train-model:
needs: build
runs-on:
- machine
- gpu=a10g # Train with GPU
steps:
- name: Train on prepared data
run: |
python train.py --artifact=${{ needs.build.outputs.artifact_path }}

Environment Variables

Machine.dev provides several environment variables that can be accessed in your workflow:

VariableDescriptionExample Value
MACHINE_GPU_TYPEType of GPUa10g
MACHINE_GPU_COUNTNumber of GPUs1
MACHINE_TENANCYTenancy typespot
MACHINE_REGIONRegion codeus-east
CUDA_VISIBLE_DEVICESCUDA devices available0

Example usage:

steps:
- name: Check environment
run: |
echo "GPU Type: $MACHINE_GPU_TYPE"
echo "GPU Count: $MACHINE_GPU_COUNT"
echo "Tenancy: $MACHINE_TENANCY"
echo "Region: $MACHINE_REGION"

Error Handling for Spot Instances

When using spot instances, you should implement error handling to manage potential interruptions:

jobs:
spot-job:
runs-on:
- machine
- gpu=a10g
- tenancy=spot
steps:
# Save checkpoints frequently
- name: Train with checkpoints
run: |
python train.py --checkpoint-freq=5
# Upload partial results
- name: Upload checkpoints
if: always() # Run even if job is cancelled
uses: actions/upload-artifact@v3
with:
name: model-checkpoints
path: checkpoints/

Best Practices

Set Appropriate Timeouts

  1. Set appropriate timeouts for your jobs:
    jobs:
    gpu-job:
    timeout-minutes: 60 # Limit to 1 hour

Use Conditional Execution

  1. Use conditional execution to skip GPU tasks when appropriate:
    steps:
    - name: GPU-intensive task
    if: github.event_name == 'push' && github.ref == 'refs/heads/main'
    run: python heavy_task.py

Optimize Workflow Triggers

  1. Optimize workflow triggers to avoid unnecessary runs:
    on:
    push:
    paths:
    - 'model/**'
    - 'data/**'
    branches:
    - main

Use Workflow Concurrency

  1. Use workflow concurrency to avoid redundant runs:
    concurrency:
    group: ${{ github.workflow }}-${{ github.ref }}
    cancel-in-progress: true

Common Issues and Solutions

IssueSolution
Runner not availableTry a different GPU type or region
Out of memory errorUse a GPU with more VRAM or optimize batch size
Slow performanceCheck data loading bottlenecks or use a faster GPU
Spot instance terminatedUse dedicated instances for critical workloads
Long provisioning timePre-warm runners with scheduled workflows