Skip to content

GitHub Actions Syntax

This reference guide provides comprehensive information on the GitHub Actions syntax for integrating Machine.dev GPU runners into your workflows.

Basic Workflow Structure

A GitHub Actions workflow is defined in YAML format and must include the following elements to use Machine.dev:

name: My GPU Workflow
on:
# Trigger events (push, pull_request, workflow_dispatch, etc.)
workflow_dispatch:
jobs:
gpu-job:
name: GPU-Accelerated Job
runs-on:
- machine # Required to use Machine.dev
- gpu=a10g # Specify GPU type
- tenancy=spot # Specify tenancy type (spot or dedicated)
steps:
# Job steps
- name: Checkout code
uses: actions/checkout@v3
# Add more steps...

Machine.dev Runner Labels

Required Label

runs-on:
- machine # Required for all Machine.dev runners

GPU Type Labels

Specify the type of GPU you want to use:

runs-on:
- machine
- gpu=<gpu-type>

Available GPU types:

LabelGPU ModelVRAMCUDA CoresUse Cases
gpu=t4NVIDIA T416 GB2,560Training, inference, computer vision
gpu=l4NVIDIA L424 GB7,424ML/DL training, inference, vision AI
gpu=a10gNVIDIA A10G24 GB9,216Model training, rendering, simulation
gpu=a100NVIDIA A10040 GB6,912Large model training, HPC
gpu=a100-80gbNVIDIA A10080 GB6,912Very large models, distributed training
gpu=l40sNVIDIA L40S48 GB18,176Generative AI, large models, computer vision

Tenancy Type Labels

Specify whether you want dedicated or spot instances:

runs-on:
- machine
- gpu=a10g
- tenancy=<tenancy-type>

Available tenancy types:

LabelDescriptionCostStability
tenancy=dedicatedDedicated instances with guaranteed availabilityHigher costHighest stability
tenancy=spotSpot instances that may be preemptedUp to 70% lower costMay be interrupted

Region Labels

Optionally specify the region where your runner should be provisioned:

runs-on:
- machine
- gpu=a10g
- region=<region-code>

Available regions:

LabelLocationLatency (US)Available GPUs
region=us-eastUS East (Virginia)LowAll types
region=us-westUS West (Oregon)MediumAll types
region=eu-westEU West (Ireland)HighT4, L4, A10G
region=ap-southeastAsia Pacific (Singapore)HighT4, L4

Complete Example

Here’s a complete example that specifies all available options:

name: Complete Machine.dev Example
on:
workflow_dispatch:
inputs:
model_type:
description: 'Type of model to train'
required: true
default: 'small'
type: choice
options:
- small
- medium
- large
jobs:
train-model:
name: Train Machine Learning Model
runs-on:
- machine
- gpu=a10g
- tenancy=spot
- region=us-east
steps:
- name: Checkout code
uses: actions/checkout@v3
- name: Setup Python
uses: actions/setup-python@v4
with:
python-version: '3.10'
- name: Install dependencies
run: |
pip install -r requirements.txt
- name: Train model
run: |
python train.py --model-size=${{ github.event.inputs.model_type }}
- name: Upload model artifacts
uses: actions/upload-artifact@v3
with:
name: trained-model
path: model/

Matrix Strategy for Multiple GPUs

You can use matrix strategy to run your workflow on multiple GPU types:

jobs:
gpu-matrix:
strategy:
matrix:
gpu: [t4, l4, a10g]
fail-fast: false
runs-on:
- machine
- gpu=${{ matrix.gpu }}
steps:
- name: Run benchmark on ${{ matrix.gpu }}
run: |
python benchmark.py --gpu-type=${{ matrix.gpu }}

Job Dependencies

You can coordinate between CPU and GPU jobs:

jobs:
prepare-data:
runs-on: ubuntu-latest
steps:
- name: Prepare dataset
run: python prepare_data.py
outputs:
dataset_path: ${{ steps.prepare.outputs.path }}
train-model:
needs: prepare-data
runs-on:
- machine
- gpu=a10g
steps:
- name: Train on prepared data
run: |
python train.py --data=${{ needs.prepare-data.outputs.dataset_path }}

Environment Variables

Machine.dev provides several environment variables that can be accessed in your workflow:

VariableDescriptionExample Value
MACHINE_GPU_TYPEType of GPUa10g
MACHINE_GPU_COUNTNumber of GPUs1
MACHINE_TENANCYTenancy typespot
MACHINE_REGIONRegion codeus-east
CUDA_VISIBLE_DEVICESCUDA devices available0

Example usage:

steps:
- name: Check environment
run: |
echo "GPU Type: $MACHINE_GPU_TYPE"
echo "GPU Count: $MACHINE_GPU_COUNT"
echo "Tenancy: $MACHINE_TENANCY"
echo "Region: $MACHINE_REGION"

Error Handling for Spot Instances

When using spot instances, you should implement error handling to manage potential interruptions:

jobs:
spot-job:
runs-on:
- machine
- gpu=a10g
- tenancy=spot
steps:
# Save checkpoints frequently
- name: Train with checkpoints
run: |
python train.py --checkpoint-freq=5
# Upload partial results
- name: Upload checkpoints
if: always() # Run even if job is cancelled
uses: actions/upload-artifact@v3
with:
name: model-checkpoints
path: checkpoints/

Best Practices

Set Appropriate Timeouts

  1. Set appropriate timeouts for your jobs:
    jobs:
    gpu-job:
    timeout-minutes: 60 # Limit to 1 hour

Use Conditional Execution

  1. Use conditional execution to skip GPU tasks when appropriate:
    steps:
    - name: GPU-intensive task
    if: github.event_name == 'push' && github.ref == 'refs/heads/main'
    run: python heavy_task.py

Optimize Workflow Triggers

  1. Optimize workflow triggers to avoid unnecessary runs:
    on:
    push:
    paths:
    - 'model/**'
    - 'data/**'
    branches:
    - main

Use Workflow Concurrency

  1. Use workflow concurrency to avoid redundant runs:
    concurrency:
    group: ${{ github.workflow }}-${{ github.ref }}
    cancel-in-progress: true

Common Issues and Solutions

IssueSolution
Runner not availableTry a different GPU type or region
Out of memory errorUse a GPU with more VRAM or optimize batch size
Slow performanceCheck data loading bottlenecks or use a faster GPU
Spot instance terminatedUse dedicated instances for critical workloads
Long provisioning timePre-warm runners with scheduled workflows