GitHub Actions Syntax
This reference guide provides comprehensive information on the GitHub Actions syntax for integrating Machine.dev GPU runners into your workflows.
Basic Workflow Structure
A GitHub Actions workflow is defined in YAML format and must include the following elements to use Machine.dev:
name: My GPU Workflow
on: # Trigger events (push, pull_request, workflow_dispatch, etc.) workflow_dispatch:
jobs: gpu-job: name: GPU-Accelerated Job runs-on: - machine # Required to use Machine.dev - gpu=a10g # Specify GPU type - tenancy=spot # Specify tenancy type (spot or dedicated)
steps: # Job steps - name: Checkout code uses: actions/checkout@v3
# Add more steps...
Machine.dev Runner Labels
Required Label
runs-on: - machine # Required for all Machine.dev runners
GPU Type Labels
Specify the type of GPU you want to use:
runs-on: - machine - gpu=<gpu-type>
Available GPU types:
Label | GPU Model | VRAM | CUDA Cores | Use Cases |
---|---|---|---|---|
gpu=t4 | NVIDIA T4 | 16 GB | 2,560 | Training, inference, computer vision |
gpu=l4 | NVIDIA L4 | 24 GB | 7,424 | ML/DL training, inference, vision AI |
gpu=a10g | NVIDIA A10G | 24 GB | 9,216 | Model training, rendering, simulation |
gpu=a100 | NVIDIA A100 | 40 GB | 6,912 | Large model training, HPC |
gpu=a100-80gb | NVIDIA A100 | 80 GB | 6,912 | Very large models, distributed training |
gpu=l40s | NVIDIA L40S | 48 GB | 18,176 | Generative AI, large models, computer vision |
Tenancy Type Labels
Specify whether you want dedicated or spot instances:
runs-on: - machine - gpu=a10g - tenancy=<tenancy-type>
Available tenancy types:
Label | Description | Cost | Stability |
---|---|---|---|
tenancy=dedicated | Dedicated instances with guaranteed availability | Higher cost | Highest stability |
tenancy=spot | Spot instances that may be preempted | Up to 70% lower cost | May be interrupted |
Region Labels
Optionally specify the region where your runner should be provisioned:
runs-on: - machine - gpu=a10g - region=<region-code>
Available regions:
Label | Location | Latency (US) | Available GPUs |
---|---|---|---|
region=us-east | US East (Virginia) | Low | All types |
region=us-west | US West (Oregon) | Medium | All types |
region=eu-west | EU West (Ireland) | High | T4, L4, A10G |
region=ap-southeast | Asia Pacific (Singapore) | High | T4, L4 |
Complete Example
Here’s a complete example that specifies all available options:
name: Complete Machine.dev Example
on: workflow_dispatch: inputs: model_type: description: 'Type of model to train' required: true default: 'small' type: choice options: - small - medium - large
jobs: train-model: name: Train Machine Learning Model runs-on: - machine - gpu=a10g - tenancy=spot - region=us-east
steps: - name: Checkout code uses: actions/checkout@v3
- name: Setup Python uses: actions/setup-python@v4 with: python-version: '3.10'
- name: Install dependencies run: | pip install -r requirements.txt
- name: Train model run: | python train.py --model-size=${{ github.event.inputs.model_type }}
- name: Upload model artifacts uses: actions/upload-artifact@v3 with: name: trained-model path: model/
Matrix Strategy for Multiple GPUs
You can use matrix strategy to run your workflow on multiple GPU types:
jobs: gpu-matrix: strategy: matrix: gpu: [t4, l4, a10g] fail-fast: false
runs-on: - machine - gpu=${{ matrix.gpu }}
steps: - name: Run benchmark on ${{ matrix.gpu }} run: | python benchmark.py --gpu-type=${{ matrix.gpu }}
Job Dependencies
You can coordinate between CPU and GPU jobs:
jobs: prepare-data: runs-on: ubuntu-latest steps: - name: Prepare dataset run: python prepare_data.py outputs: dataset_path: ${{ steps.prepare.outputs.path }}
train-model: needs: prepare-data runs-on: - machine - gpu=a10g steps: - name: Train on prepared data run: | python train.py --data=${{ needs.prepare-data.outputs.dataset_path }}
Environment Variables
Machine.dev provides several environment variables that can be accessed in your workflow:
Variable | Description | Example Value |
---|---|---|
MACHINE_GPU_TYPE | Type of GPU | a10g |
MACHINE_GPU_COUNT | Number of GPUs | 1 |
MACHINE_TENANCY | Tenancy type | spot |
MACHINE_REGION | Region code | us-east |
CUDA_VISIBLE_DEVICES | CUDA devices available | 0 |
Example usage:
steps: - name: Check environment run: | echo "GPU Type: $MACHINE_GPU_TYPE" echo "GPU Count: $MACHINE_GPU_COUNT" echo "Tenancy: $MACHINE_TENANCY" echo "Region: $MACHINE_REGION"
Error Handling for Spot Instances
When using spot instances, you should implement error handling to manage potential interruptions:
jobs: spot-job: runs-on: - machine - gpu=a10g - tenancy=spot
steps: # Save checkpoints frequently - name: Train with checkpoints run: | python train.py --checkpoint-freq=5
# Upload partial results - name: Upload checkpoints if: always() # Run even if job is cancelled uses: actions/upload-artifact@v3 with: name: model-checkpoints path: checkpoints/
Best Practices
Set Appropriate Timeouts
- Set appropriate timeouts for your jobs:
jobs:gpu-job:timeout-minutes: 60 # Limit to 1 hour
Use Conditional Execution
- Use conditional execution to skip GPU tasks when appropriate:
steps:- name: GPU-intensive taskif: github.event_name == 'push' && github.ref == 'refs/heads/main'run: python heavy_task.py
Optimize Workflow Triggers
- Optimize workflow triggers to avoid unnecessary runs:
on:push:paths:- 'model/**'- 'data/**'branches:- main
Use Workflow Concurrency
- Use workflow concurrency to avoid redundant runs:
concurrency:group: ${{ github.workflow }}-${{ github.ref }}cancel-in-progress: true
Common Issues and Solutions
Issue | Solution |
---|---|
Runner not available | Try a different GPU type or region |
Out of memory error | Use a GPU with more VRAM or optimize batch size |
Slow performance | Check data loading bottlenecks or use a faster GPU |
Spot instance terminated | Use dedicated instances for critical workloads |
Long provisioning time | Pre-warm runners with scheduled workflows |