LLM Supervised Fine-Tuning
The LLM Supervised Fine-Tuning workflow allows you to fine-tune language models using popular conversational datasets. This implementation leverages Machine GPU runners to efficiently train models, providing optimized model versions tuned to your specific use cases.
In this example we will be fine-tuning the Llama 3.2 3B Instruct model using the FineTome-100k dataset.
Prerequisites
You will need to have completed the Quick Start guide.
Use Case Overview
Why might you want to fine-tune language models?
- Adapt pre-trained models to specific domains or tasks
- Improve performance on domain-specific conversational scenarios
- Create models that better align with your brand voice or style
- Reduce hallucinations and improve factual accuracy in specific domains
How It Works
The LLM Supervised Fine-Tuning workflow uses Unsloth to accelerate the fine-tuning process. The workflow is defined in GitHub Actions workflow files and can be triggered on-demand with configurable parameters.
The fine-tuning process:
- Loads a specified base model (e.g., Llama 3.2 3B Instruct)
- Prepares a conversational dataset (e.g., FineTome-100k or OpenAssistant’s oasst1)
- Applies Low-Rank Adaptation (LoRA) for memory-efficient training
- Automatically saves checkpoints during training (in the retry-enabled workflow)
- Pushes the fine-tuned model to Hugging Face Hub
Workflow Implementation
The LLM Supervised Fine-Tuning is implemented as GitHub Actions workflows that can be triggered manually. Here’s the basic workflow definition:
name: Supervised Fine-Tuning
on: workflow_dispatch: inputs: source_model: type: string required: false description: 'The base model to fine-tune' default: 'unsloth/Llama-3.2-3B-Instruct' data_set: type: string required: false description: 'Which dataset to use for fine-tuning' default: 'finetome-100k' max_seq_length: type: string required: false description: 'The maximum sequence length' default: '4096' lora_rank: type: string required: false description: 'The lora rank' default: '64' max_steps: type: string required: false description: 'The maximum number of steps' default: '250' gpu_memory_utilization: type: string required: false description: 'The GPU memory utilization' default: '0.90' learning_rate: type: string required: false description: 'The learning rate' default: '2e-5' per_device_train_batch_size: type: string required: false description: 'The per device training batch size' default: '2' hf_repo: type: string required: true description: 'The Hugging Face repository to upload the model to'
jobs: train: name: Supervised LoRA Training (unsloth) runs-on: - machine - gpu=T4 - cpu=4 - ram=16 - architecture=x64 timeout-minutes: 180 env: SOURCE_MODEL: ${{ inputs.source_model }} MAX_SEQ_LENGTH: ${{ inputs.max_seq_length }} LORA_RANK: ${{ inputs.lora_rank }} DATA_SET: ${{ inputs.data_set }} GPU_MEMORY_UTILIZATION: ${{ inputs.gpu_memory_utilization }} MAX_STEPS: ${{ inputs.max_steps }} LEARNING_RATE: ${{ inputs.learning_rate }} PER_DEVICE_TRAIN_BATCH_SIZE: ${{ inputs.per_device_train_batch_size }} HF_TOKEN: ${{ secrets.HF_TOKEN }} HF_HUB_ENABLE_HF_TRANSFER: 1 HF_REPO: ${{ inputs.hf_repo }} steps: - uses: actions/checkout@v4
- name: Set up Python 3.10 uses: actions/setup-python@v4 with: python-version: '3.10'
- name: Install dependencies run: | pip install -r requirements.txt
- name: Run Training run: | python3 train.py
Advanced Retry Mechanism
For enhanced reliability, the repository also provides a workflow with automatic checkpointing and retry functionality:
name: Supervised Fine-Tuning with Retry
on: workflow_dispatch: inputs: attempt: type: string description: 'The attempt number' default: '1' max_attempts: type: number description: 'The maximum number of attempts' default: 5 # Same parameters as in the basic workflow # ...
This implementation ensures training progress isn’t lost due to spot instance interruptions by:
- Automatically saving checkpoints to Hugging Face Hub during training
- Detecting spot instance interruptions using a custom GitHub Action
- Restarting the workflow with an incremented attempt number
- Resuming training from the latest checkpoint
The retry mechanism works through the following steps:
- The workflow starts a training job with a specified attempt number (default: 1)
- During training, checkpoints are periodically saved to Hugging Face Hub
- If the job completes successfully, the workflow ends
- If the job fails due to a spot instance interruption:
- The
check-runner-interruption
action detects that the failure was due to a spot instance preemption - The workflow calculates the next attempt number
- If within the maximum attempts limit, it triggers a new workflow run with an incremented attempt number
- All original parameters are preserved for the new attempt
- The
- When a new attempt starts, it downloads the latest checkpoint and resumes training from that point
This mechanism ensures that even if a spot instance is reclaimed, your training progress isn’t lost, and the job can continue from the last checkpoint on a new instance.
Using Machine GPU Runners
This fine-tuning process leverages Machine GPU runners to provide the necessary computing power. The workflow is configured to use:
- T4 GPU: An entry-level ML training GPU with 16GB of VRAM, suitable for efficient training with unsloth optimizations
- Spot instance: To optimize for cost while maintaining performance
- Configurable resources: CPU, RAM, and architecture specifications
For more demanding models or larger datasets, you can also configure the workflow to use more powerful GPUs:
runs-on: - machine - gpu=L4 - cpu=4 - ram=16 - architecture=x64
Getting Started
To run the LLM Supervised Fine-Tuning workflow:
- Use the MachineHQ/llm-supervised-fine-tuning repository as a template
- Set up a Hugging Face access token with write permissions
- Add this token as a repository secret named
HF_TOKEN
in your GitHub repository settings - Navigate to the Actions tab in your repository
- Select the “Supervised Fine-Tuning with Retry” workflow
- Click “Run workflow” and configure your parameters:
- Choose your base model and dataset
- Adjust sequence length, LoRA rank, and training steps
- Configure GPU memory utilization and learning rate
- Specify your Hugging Face target repository
- Run the workflow and wait for results
- Access your fine-tuned model on Hugging Face Hub
Best Practices
- Select appropriate datasets: Choose datasets that match your target application domain
- Adjust batch size for your GPU: Lower batch sizes if you encounter out-of-memory errors
- Use checkpointing for longer runs: For extensive training sessions, use the retry-enabled workflow
- Monitor training progress: Check workflow logs to observe loss metrics
- Test with prompts similar to your use case: Evaluate the model on examples that match your intended application
Next Steps
- Explore the full MachineHQ/llm-supervised-fine-tuning repository
- Learn about GPU runner specifications