Skip to content

CPU vs GPU

This page helps you pick the right Machine runner for your workload. The short version: use a GPU only if your code uses CUDA or calls a GPU library. For everything else, CPU is faster to start, cheaper, and simpler.

Decision tree

GPU selection decision tree: CUDA needed? → Inference or fine-tuning? → Model size → recommended GPU with $/min price

WorkloadRecommended GPU$/min (spot)Why
Inference, ≤7B model FP16T4G$0.00414 GB fits 16 GB VRAM with KV cache headroom
Inference, 7–13B 4-bit quantizedL4$0.00613B quantized to ~7 GB fits 24 GB easily
Inference, 7–13B FP16 or ≤30B 4-bitL40S$0.01613B FP16 needs ~26 GB which exceeds L4’s 24 GB
High-throughput inference at scaleInferentia2$0.003Cheapest GPU-class option, optimized for inference
Computer vision (real-time)A10G$0.011–$0.019Best latency-to-cost ratio for CV workloads
QLoRA fine-tune ≤13BT4G or T4$0.0044-bit quantization keeps memory under 16 GB
QLoRA fine-tune 30BL4$0.00620 GB fits 24 GB
QLoRA fine-tune 70BL40S$0.01646 GB tight but fits 48 GB
LoRA fine-tune 7B (FP16)L4 or A10G$0.006–$0.01115 GB needs >16 GB headroom
LoRA fine-tune 13B (FP16)L40S$0.01628 GB needs ≥48 GB for headroom
Full fine-tune anything >7BMulti-GPU or TrainiumvariesSingle GPU not viable above 7B for full fine-tunes
AWS Neuron native trainingTrainium$0.006Built for AWS Neuron SDK workloads
Builds, tests, CI (no CUDA)CPU$0.0003–$0.010GPU is overkill and slower to start

Memory math (the actual numbers)

These are approximate VRAM requirements per technique. Actual usage varies with batch size, sequence length, and overhead.

Technique7B13B30B70B
Full fine-tune (FP16)~67 GB~125 GB~288 GB~672 GB
LoRA (FP16)~15 GB~28 GB~63 GB~146 GB
QLoRA (8-bit)~9 GB~17 GB~38 GB~88 GB
QLoRA (4-bit)~5 GB~9 GB~20 GB~46 GB
Inference (FP16)~14 GB~26 GB~60 GB~140 GB
Inference (4-bit)~4 GB~7 GB~16 GB~40 GB

Sources: Modal, Runpod, Hyperstack, Spheron, Crusoe, Hugging Face Forum.

When to use a CPU runner instead

Use CPU runners for:

  • Builds and compilation — even large C/C++/Rust projects fit in CPU memory and parallelize well
  • Test suites — even GPU-related tests, if they use CPU-only mocks
  • Data preprocessing without GPU acceleration
  • Linting, formatting, code analysis
  • Container builds (docker build)
  • Web app builds (Next.js, Vite, Webpack, Astro)
  • Database operations and migrations
  • Anything that doesn’t import a CUDA library

CPU runners start faster, cost less, and offer up to 64 vCPUs and 128 GB RAM — more than enough for most non-GPU work.

Architecture: X64 vs ARM64 (CPU)

ArchitectureWhen to pick it
X64 (Intel/AMD)Maximum compatibility — most prebuilt binaries, Docker images, and CI tooling assume X64. Default.
ARM64 (Graviton)~15–20% cheaper at the same vCPU count. Great for Go, Rust, Java, Python, and modern container workloads. The T4G GPU is also ARM64.

Most modern Linux software runs cleanly on ARM64. If your toolchain doesn’t have a special X86-only dependency, ARM saves money.

Next steps