Run ML experiments on Runpod with Transformer Lab

Transformer Lab is an open-source research environment for AI researchers to train, fine-tune and evaluate models. It allows you to easily scale training from local hardware to cloud GPUs. It provides a unified interface to all your compute resources and simplifies experiment/checkpoint tracking, job scheduling, auto-recovery, centralized artifact storage and more. This guide shows you how to configure Transformer Lab to run ML workloads on Runpod GPUs.

Requirements

You’ll need:

A Runpod account with an API key.
macOS, Linux, or Windows with WSL2.
Python 3.8 or higher.
Git and curl installed.

Windows usersTransformer Lab requires WSL2 (Windows Subsystem for Linux). Install WSL2 first, then follow the Linux instructions within your WSL2 environment.

Install Transformer Lab

Run the install script

Open a terminal and run:

curl -fsSL https://lab.cloud/install.sh | bash -s -- multiuser_setup

This installs Transformer Lab to ~/.transformerlab, sets up a conda environment with all dependencies, and enables the Team Settings features needed for cloud provider configuration.

Launch Transformer Lab

Start the Transformer Lab server:

cd ~/.transformerlab/src
./run.sh

Open your browser to http://localhost:8338.

Use the default credentials:

Email: admin@example.com
Password: admin123

Change these credentials after your first login for security.

Configure shared storage

For remote task execution, Transformer Lab requires shared storage so your local instance can communicate with remote Pods. Configure one of the following:

Amazon S3: Create an S3 bucket and configure credentials.
Google Cloud Storage: Create a GCS bucket and configure service account.
Azure Blob Storage: Create a storage container and configure credentials.

Refer to the Transformer Lab documentation for detailed shared storage setup instructions.

Configure Runpod as a compute provider

Get your Runpod API key

In the Runpod console, go to Settings and create an API key with All permissions or Restricted permissions that include Pod access.Copy the API key. Runpod doesn’t store it, so save it securely.

Open Team Settings

In Transformer Lab, click your profile icon in the top right corner and select Team Settings.

Add Runpod as a provider

Navigate to Compute Providers and click Add Provider.In the modal that opens:

Enter a name for your provider (e.g., “runpod-provider”). Remember this name—you’ll use it in your task.yaml files.
Select Runpod as the provider type.
In the configuration JSON field, add your Runpod API key:

{
  "api_key": "YOUR_RUNPOD_API_KEY",
  "api_base_url": "https://rest.runpod.io/v1"
}

Leave the base URL as is.Click Add Compute Provider to save the provider.

Run a task on Runpod

Transformer Lab uses task files to define cloud workloads. Tasks specify the resources, setup commands, and run commands for your job. For detailed information on task configuration, see the Task YAML Structure documentation. You can also browse the Task Gallery for pre-built templates.

Create a task

Open the Tasks menu

In the Transformer Lab sidebar, click Tasks to open the task management interface.

Create a new task

Click New to add a new task. Select Start with a blank task template, then click Submit.

Configure the task

In the task editor, paste the following YAML configuration:

name: hello-runpod
resources:
  compute_provider: runpod-provider
  cpus: 4
  memory: 16
  accelerators: "A40:1"
setup: |
  echo "Setting up environment..."
  pip install torch
run: |
  echo "Hello from Runpod!"
  python -c "import torch; print(f'CUDA available: {torch.cuda.is_available()}'); print(f'GPU: {torch.cuda.get_device_name(0)}' if torch.cuda.is_available() else 'GPU: None')"

Replace runpod-provider with the name you gave your Runpod provider in Team Settings.This configuration requests a single NVIDIA A40 GPU on Runpod, installs PyTorch, and runs a simple script to verify GPU access.

Queue the task

Click Queue to submit the task you just created. Select your Runpod compute provider and click Submit to start the job. Transformer Lab provisions a Pod on Runpod, runs your task, and displays the output in the task logs.

Monitor task progress

Once queued, your task appears in the Tasks list with its current status. Click Output to view the task logs. The output modal has two tabs:

Lab SDK Output: Shows output from scripts that use the transformerlab Python package.
Machine Logs: Shows raw stdout/stderr from the Pod. Use this tab to see output from standard print() statements.

For the examples in this guide, check the Machine Logs tab to see your task output.

Stop a running task

To stop a task before it completes, click the stop button (square icon). This terminates the Runpod Pod and releases the resources. You can also verify that no Pods are running by checking the Runpod console.

Specify GPU types

Use the accelerators field to specify the GPU type:

Accelerator	Description
`"RTX4090:1"`	NVIDIA GeForce RTX 4090 (24GB)
`"A40:1"`	NVIDIA A40 (48GB)
`"A100:1"`	NVIDIA A100 (40GB or 80GB)
`"A100-80GB:1"`	NVIDIA A100 80GB
`"H100:1"`	NVIDIA H100 (80GB)
`"L40S:1"`	NVIDIA L40S (48GB)

For multiple GPUs, change the count: "A100:4" for 4x A100 GPUs.

Clean up

When your tasks complete, Transformer Lab automatically releases the Runpod resources. To manually stop a running task, select it from the Tasks list and click Stop. You can also verify that no Pods are running by checking the Runpod console.

​Requirements

​Install Transformer Lab

​Configure shared storage

​Configure Runpod as a compute provider

​Run a task on Runpod

​Create a task

​Monitor task progress

​Stop a running task

​Specify GPU types

​Clean up

Requirements

Install Transformer Lab

Configure shared storage

Configure Runpod as a compute provider

Run a task on Runpod

Create a task

Monitor task progress

Stop a running task

Specify GPU types

Clean up