Skip to content

LLM Supervised Fine-Tuning

The LLM Supervised Fine-Tuning workflow allows you to fine-tune language models using popular conversational datasets. This implementation leverages Machine GPU runners to efficiently train models, providing optimized model versions tuned to your specific use cases.

In this example we will be fine-tuning the Llama 3.2 3B Instruct model using the FineTome-100k dataset.

Prerequisites

You will need to have completed the Quick Start guide.

Use Case Overview

Why might you want to fine-tune language models?

  • Adapt pre-trained models to specific domains or tasks
  • Improve performance on domain-specific conversational scenarios
  • Create models that better align with your brand voice or style
  • Reduce hallucinations and improve factual accuracy in specific domains

How It Works

The LLM Supervised Fine-Tuning workflow uses Unsloth to accelerate the fine-tuning process. The workflow is defined in GitHub Actions workflow files and can be triggered on-demand with configurable parameters.

The fine-tuning process:

  1. Loads a specified base model (e.g., Llama 3.2 3B Instruct)
  2. Prepares a conversational dataset (e.g., FineTome-100k or OpenAssistant’s oasst1)
  3. Applies Low-Rank Adaptation (LoRA) for memory-efficient training
  4. Automatically saves checkpoints during training (in the retry-enabled workflow)
  5. Pushes the fine-tuned model to Hugging Face Hub

Workflow Implementation

The LLM Supervised Fine-Tuning is implemented as GitHub Actions workflows that can be triggered manually. Here’s the basic workflow definition:

name: Supervised Fine-Tuning
on:
workflow_dispatch:
inputs:
source_model:
type: string
required: false
description: 'The base model to fine-tune'
default: 'unsloth/Llama-3.2-3B-Instruct'
data_set:
type: string
required: false
description: 'Which dataset to use for fine-tuning'
default: 'finetome-100k'
max_seq_length:
type: string
required: false
description: 'The maximum sequence length'
default: '4096'
lora_rank:
type: string
required: false
description: 'The lora rank'
default: '64'
max_steps:
type: string
required: false
description: 'The maximum number of steps'
default: '250'
gpu_memory_utilization:
type: string
required: false
description: 'The GPU memory utilization'
default: '0.90'
learning_rate:
type: string
required: false
description: 'The learning rate'
default: '2e-5'
per_device_train_batch_size:
type: string
required: false
description: 'The per device training batch size'
default: '2'
hf_repo:
type: string
required: true
description: 'The Hugging Face repository to upload the model to'
jobs:
train:
name: Supervised LoRA Training (unsloth)
runs-on:
- machine
- gpu=T4
- cpu=4
- ram=16
- architecture=x64
timeout-minutes: 180
env:
SOURCE_MODEL: ${{ inputs.source_model }}
MAX_SEQ_LENGTH: ${{ inputs.max_seq_length }}
LORA_RANK: ${{ inputs.lora_rank }}
DATA_SET: ${{ inputs.data_set }}
GPU_MEMORY_UTILIZATION: ${{ inputs.gpu_memory_utilization }}
MAX_STEPS: ${{ inputs.max_steps }}
LEARNING_RATE: ${{ inputs.learning_rate }}
PER_DEVICE_TRAIN_BATCH_SIZE: ${{ inputs.per_device_train_batch_size }}
HF_TOKEN: ${{ secrets.HF_TOKEN }}
HF_HUB_ENABLE_HF_TRANSFER: 1
HF_REPO: ${{ inputs.hf_repo }}
steps:
- uses: actions/checkout@v4
- name: Set up Python 3.10
uses: actions/setup-python@v4
with:
python-version: '3.10'
- name: Install dependencies
run: |
pip install -r requirements.txt
- name: Run Training
run: |
python3 train.py

Advanced Retry Mechanism

For enhanced reliability, the repository also provides a workflow with automatic checkpointing and retry functionality:

name: Supervised Fine-Tuning with Retry
on:
workflow_dispatch:
inputs:
attempt:
type: string
description: 'The attempt number'
default: '1'
max_attempts:
type: number
description: 'The maximum number of attempts'
default: 5
# Same parameters as in the basic workflow
# ...

This implementation ensures training progress isn’t lost due to spot instance interruptions by:

  1. Automatically saving checkpoints to Hugging Face Hub during training
  2. Detecting spot instance interruptions using a custom GitHub Action
  3. Restarting the workflow with an incremented attempt number
  4. Resuming training from the latest checkpoint

The retry mechanism works through the following steps:

  1. The workflow starts a training job with a specified attempt number (default: 1)
  2. During training, checkpoints are periodically saved to Hugging Face Hub
  3. If the job completes successfully, the workflow ends
  4. If the job fails due to a spot instance interruption:
    • The check-runner-interruption action detects that the failure was due to a spot instance preemption
    • The workflow calculates the next attempt number
    • If within the maximum attempts limit, it triggers a new workflow run with an incremented attempt number
    • All original parameters are preserved for the new attempt
  5. When a new attempt starts, it downloads the latest checkpoint and resumes training from that point

This mechanism ensures that even if a spot instance is reclaimed, your training progress isn’t lost, and the job can continue from the last checkpoint on a new instance.

Using Machine GPU Runners

This fine-tuning process leverages Machine GPU runners to provide the necessary computing power. The workflow is configured to use:

  • T4 GPU: An entry-level ML training GPU with 16GB of VRAM, suitable for efficient training with unsloth optimizations
  • Spot instance: To optimize for cost while maintaining performance
  • Configurable resources: CPU, RAM, and architecture specifications

For more demanding models or larger datasets, you can also configure the workflow to use more powerful GPUs:

runs-on:
- machine
- gpu=L4
- cpu=4
- ram=16
- architecture=x64

Getting Started

To run the LLM Supervised Fine-Tuning workflow:

  1. Use the MachineHQ/llm-supervised-fine-tuning repository as a template
  2. Set up a Hugging Face access token with write permissions
  3. Add this token as a repository secret named HF_TOKEN in your GitHub repository settings
  4. Navigate to the Actions tab in your repository
  5. Select the “Supervised Fine-Tuning with Retry” workflow
  6. Click “Run workflow” and configure your parameters:
    • Choose your base model and dataset
    • Adjust sequence length, LoRA rank, and training steps
    • Configure GPU memory utilization and learning rate
    • Specify your Hugging Face target repository
  7. Run the workflow and wait for results
  8. Access your fine-tuned model on Hugging Face Hub

Best Practices

  • Select appropriate datasets: Choose datasets that match your target application domain
  • Adjust batch size for your GPU: Lower batch sizes if you encounter out-of-memory errors
  • Use checkpointing for longer runs: For extensive training sessions, use the retry-enabled workflow
  • Monitor training progress: Check workflow logs to observe loss metrics
  • Test with prompts similar to your use case: Evaluate the model on examples that match your intended application

Next Steps