Machine supercharges your GitHub Workflows with seamless GPU acceleration. Say goodbye to the tedious overhead of managing GPU runners and hello to streamlined efficiency.

What GPU types are available?

Machine offers a wide selection of NVIDIA GPUs including T4G, T4, L4, A10G, and L40S, as well as AWS Inferentia options to match your specific workflow needs.

How do I integrate Machine with GitHub Actions?

Machine integrates natively with GitHub Actions. Simply update your workflows with a runs-on tag and start accelerating your tasks immediately.

How much faster are GPU-accelerated workflows?

Machine accelerates model training, inference, batch processing, and simulations—up to 100× faster than CPU-only workflows.

Cost Optimization

This guide provides practical strategies to help you minimize costs while maximizing the effectiveness of your GPU-accelerated workflows on Machine. By implementing these optimization techniques, you can significantly reduce your spending without sacrificing performance.

Understanding Machine Pricing

Before diving into optimization strategies, it’s essential to understand how Machine pricing works:

Credit-based system: You pay for GPU time using credits
Usage-based billing: You only pay for the exact time your workflows run
GPU-specific rates: Different GPU types have different credit consumption rates
Resource-based pricing: Additional CPU cores and RAM affect pricing

Key Cost Optimization Strategies

1. Use Spot Instances with Intelligent Retries

Spot instances can save you up to 85% compared to on-demand instances. When combined with intelligent retry mechanisms, they offer the perfect balance of cost and reliability:

runs-on:
  - machine
  - gpu=a10g
  - tenancy=spot

Best for:

Non-critical workloads
Jobs that can be retried if interrupted
Development and testing workflows

Implementation tips:

Implement checkpointing to save progress regularly
Set up automatic retry mechanisms for spot instance interruptions
Use the intelligent retry patterns from our example workflows

Implementing Intelligent Retries

Our LLM Supervised Fine-Tuning and GRPO Fine-Tuning workflows demonstrate how to implement robust retry mechanisms:

name: Workflow with Retry

on:
  workflow_dispatch:
    inputs:
      attempt:
        type: string
        description: 'The attempt number'
        default: '1'
      max_attempts:
        type: number
        description: 'The maximum number of attempts'
        default: 5
      # Other workflow parameters

The intelligent retry mechanism works through these steps:

The workflow starts with a specified attempt number (default: 1)
During execution, checkpoints are periodically saved to Hugging Face Hub or another storage location
If the job completes successfully, the workflow ends
If the job fails due to a spot instance interruption:
- A custom GitHub Action detects the failure was due to spot instance preemption
- The workflow calculates the next attempt number
- If within the maximum attempts limit, it triggers a new workflow run with an incremented attempt number
- All original parameters are preserved for the new attempt
When a new attempt starts, it downloads the latest checkpoint and resumes from that point

This ensures that even if a spot instance is reclaimed, your progress isn’t lost, and the job can continue from the last checkpoint on a new instance.

2. Implement Checkpointing to Hugging Face

Save your progress regularly to avoid losing work due to spot instance interruptions:

# Example checkpoint saving code
def save_checkpoint(model, optimizer, epoch, step, hf_repo_id):
    # Save model state
    checkpoint = {
        'model_state_dict': model.state_dict(),
        'optimizer_state_dict': optimizer.state_dict(),
        'epoch': epoch,
        'step': step
    }

    # Save to disk first
    torch.save(checkpoint, 'checkpoint.pt')

    # Push to Hugging Face Hub
    api = HfApi()
    api.upload_file(
        path_or_fileobj="checkpoint.pt",
        path_in_repo="checkpoint.pt",
        repo_id=hf_repo_id,
        repo_type="model"
    )

    print(f"Checkpoint saved at epoch {epoch}, step {step}")

To resume from a checkpoint:

# Example checkpoint loading code
def load_checkpoint(model, optimizer, hf_repo_id):
    try:
        # Download from Hugging Face Hub
        api = HfApi()
        api.download_file(
            repo_id=hf_repo_id,
            filename="checkpoint.pt",
            local_path="checkpoint.pt",
            repo_type="model"
        )

        # Load checkpoint
        checkpoint = torch.load('checkpoint.pt')
        model.load_state_dict(checkpoint['model_state_dict'])
        optimizer.load_state_dict(checkpoint['optimizer_state_dict'])
        epoch = checkpoint['epoch']
        step = checkpoint['step']

        print(f"Resumed from epoch {epoch}, step {step}")
        return epoch, step
    except:
        print("No checkpoint found, starting from scratch")
        return 0, 0

3. Right-size Your GPU Resources

Choose the smallest GPU that meets your needs:

Workload	Recommended GPU	Why
Testing, small models	T4G/T4 (16GB)	Lowest cost per hour
Medium models	L4 (24GB)	Good balance of memory/performance
Large models	A10G (24GB)	More memory and compute
Very large models	L40S (48GB)	Maximum memory capacity

Implementation:

# Instead of always using the largest GPU:
runs-on:
  - machine
  - gpu=t4  # Choose the right-sized GPU for your task

4. Optimize Job Duration

The less time your job runs, the less you pay:

Use mixed precision training:

# In PyTorch:
from torch.cuda.amp import autocast, GradScaler

scaler = GradScaler()
with autocast():
    outputs = model(inputs)
    loss = loss_fn(outputs, targets)
scaler.scale(loss).backward()
scaler.step(optimizer)
scaler.update()

Implement efficient data loading:

# Optimize PyTorch DataLoader:
dataloader = DataLoader(
    dataset,
    batch_size=32,
    num_workers=4,  # Adjust based on CPU cores
    pin_memory=True,
    prefetch_factor=2
)

Use efficient model architectures:

Consider more efficient model architectures (e.g., MobileNet vs. ResNet)
Use pruning or quantization where possible
Consider LoRA or other parameter-efficient fine-tuning methods

5. Optimize Resource Allocation

Specify only the resources you actually need:

runs-on:
  - machine
  - gpu=l4
  - cpu=4    # Only request what you need
  - ram=16   # Only request what you need

Monitoring tip: Run a test job with GPU monitoring to determine actual resource usage:

steps:
  - name: Monitor resource usage
    run: |
      nvidia-smi dmon -s pucvmet -d 5 > gpu_metrics.log &
      NVIDIA_PID=$!
      vmstat 5 > cpu_metrics.log &
      VMSTAT_PID=$!

      # Run your workload
      python train.py

      # Stop monitoring
      kill $NVIDIA_PID $VMSTAT_PID

      # Upload metrics
      cat gpu_metrics.log cpu_metrics.log

6. Use Regional Selection Effectively

Different regions have different pricing:

runs-on:
  - machine
  - gpu=t4
  - regions=us-east-1,us-west-2  # Regions with best pricing

Tips:

Include multiple regions to ensure availability
Consider data sovereignty requirements when selecting regions

7. Implement Smart Caching

Reduce computation by caching dependencies and intermediate results:

steps:
  - name: Cache dependencies
    uses: actions/cache@v3
    with:
      path: ~/.cache/pip
      key: ${{ runner.os }}-pip-${{ hashFiles('**/requirements.txt') }}

  - name: Cache preprocessed data
    uses: actions/cache@v3
    with:
      path: ./data/processed
      key: preprocessed-data-v1

8. Use Workflow Conditions and Filters

Only run GPU jobs when necessary:

name: ML Pipeline

on:
  push:
    branches: [ main ]
    paths:
      - 'model/**'
      - 'data/**'

jobs:
  train:
    # Only runs when model or data files change
    runs-on:
      - machine
      - gpu=l4

Or use conditional execution:

jobs:
  validate:
    runs-on: ubuntu-latest
    outputs:
      should_train: ${{ steps.check.outputs.should_train }}

    steps:
      - id: check
        run: |
          # Logic to determine if training is needed
          echo "should_train=true" >> $GITHUB_OUTPUT

  train:
    needs: validate
    if: ${{ needs.validate.outputs.should_train == 'true' }}
    runs-on:
      - machine
      - gpu=a10g

Monitoring and Analyzing Costs

Using the Machine Dashboard

The Machine dashboard provides detailed insights into your GPU usage and costs:

Job tracking: The dashboard shows all your previously run jobs, currently running jobs, and queued jobs in one place
Cost visibility: For completed jobs, you can see the exact runtime and cost in credits used
Usage aggregation: View daily aggregates for all on-demand and spot credits consumed within a specified date range
Resource utilization: See GPU, CPU, and memory allocation for each job

This information helps you identify optimization opportunities and track spending patterns over time.

Real-World Cost Optimization Examples

Example 1: LLM Fine-tuning with Retry Mechanism

The LLM Supervised Fine-Tuning workflow demonstrates effective cost optimization:

name: Supervised Fine-Tuning with Retry

on:
  workflow_dispatch:
    inputs:
      attempt:
        type: string
        description: 'The attempt number'
        default: '1'
      # Other parameters...

jobs:
  train:
    name: Training
    runs-on:
      - machine
      - gpu=T4
      - cpu=4
      - ram=16
      - tenancy=spot  # Cost savings with spot instances

    steps:
      # Checkpoint handling steps
      - name: Download previous checkpoint if available
        run: |
          if [[ "${{ inputs.attempt }}" -gt "1" ]]; then
            python download_checkpoint.py --repo "${{ inputs.hf_repo }}"
          fi

      # Training with checkpointing
      - name: Run training
        run: |
          python train.py \
            --checkpoint-every 100 \
            --save-to-hf

Key cost optimization techniques:

Using spot instances (~85% cost reduction)
Implementing automatic checkpointing and retry mechanisms
Right-sizing resources (T4 GPU, 4 CPU cores)
Using LoRA for parameter-efficient fine-tuning

Example 2: GRPO Fine-Tuning with Spot Instance Resilience

The GRPO Fine-Tuning workflow shows how to implement resilient training on spot instances:

jobs:
  train:
    name: Training
    runs-on:
      - machine
      - gpu=L40S  # Needed for larger models
      - tenancy=spot

    steps:
      # Setup steps...

      # Checkpoint handling
      - name: Check for existing checkpoints
        id: check-checkpoint
        run: |
          python check_checkpoints.py \
            --hf-repo "${{ inputs.hf_repo }}" \
            --set-output

      # Training with progressive saving
      - name: Training
        run: |
          python train.py \
            --checkpoint-dir ./checkpoints \
            --save-steps 100 \
            --push-to-hub

This approach combines:

Spot instance cost savings
Automatic checkpoint detection and resumption
Periodic saving to Hugging Face Hub
Intelligent retries for interrupted jobs

Best Practices Summary

Always use spot instances with intelligent retries for non-critical workloads
Implement regular checkpointing to Hugging Face Hub to handle spot instance interruptions
Right-size your GPU, CPU, and RAM for each specific task
Use the Machine dashboard to monitor job costs and resource utilization
Use mixed precision training where possible
Cache dependencies and datasets to reduce job time

Next Steps

Learn about GPU runner specifications to choose the right hardware
Check out our Workflow Setup guide for detailed configuration instructions