Skip to main content
Flash provides access to a wide range of NVIDIA GPUs through both pool-based and specific GPU selection. This page lists all available GPU types and explains how to use them.

GPU selection methods

Flash offers two ways to specify GPU hardware:
  1. GPU pools (GpuGroup): Select from predefined pools of similar GPUs grouped by architecture and VRAM.
  2. Specific GPU types (GpuType): Target exact GPU models when you need precise hardware characteristics.
You can use either method or mix both for advanced fallback strategies.

GPU pools

The GpuGroup enum provides access to GPU pools. Each pool contains specific GPU models grouped by architecture and VRAM capacity.

Available GPU pools

GpuGroupGPUs IncludedVRAMBest For
GpuGroup.ANYAny available GPUVariesFast provisioning, prototyping
GpuGroup.AMPERE_16RTX A4000, RTX 4000 Ada, RTX 2000 Ada16GBSmall models, basic inference
GpuGroup.AMPERE_24RTX A4500, RTX A5000, RTX 309020-24GBGeneral ML, mid-size models
GpuGroup.ADA_24L4, RTX 409024GBCost-effective inference
GpuGroup.ADA_32_PRORTX 509032GBLatest consumer flagship
GpuGroup.AMPERE_48A40, RTX A600048GBLarge models, fine-tuning
GpuGroup.ADA_48_PRORTX 6000 Ada48GBProfessional inference
GpuGroup.AMPERE_80A100 80GB PCIe, A100-SXM4-80GB80GBXL models, intensive training
GpuGroup.ADA_80_PROH100 80GB HBM380GBCutting-edge inference
GpuGroup.HOPPER_141H200141GBLargest models, maximum VRAM

Using GPU pools

from runpod_flash import Endpoint, GpuGroup

# Single GPU pool
@Endpoint(name="inference", gpu=GpuGroup.AMPERE_80)
async def infer(data: dict) -> dict:
    ...

# Multiple pools for fallback
@Endpoint(
    name="flexible",
    gpu=[GpuGroup.AMPERE_80, GpuGroup.AMPERE_48, GpuGroup.ADA_24]
)
async def flexible_infer(data: dict) -> dict:
    ...

# Any available GPU (fastest provisioning)
@Endpoint(name="development", gpu=GpuGroup.ANY)
async def dev_infer(data: dict) -> dict:
    ...

Specific GPU types

The GpuType enum provides access to specific GPU models. Use these when you need exact hardware characteristics.

Available GPU types

GpuTypeGPU ModelVRAMArchitecture
GpuType.NVIDIA_RTX_A4000NVIDIA RTX A400016GBAmpere
GpuType.NVIDIA_RTX_A4500NVIDIA RTX A450020GBAmpere
GpuType.NVIDIA_RTX_4000_ADA_GENERATIONNVIDIA RTX 4000 Ada16GBAda Lovelace
GpuType.NVIDIA_RTX_2000_ADA_GENERATIONNVIDIA RTX 2000 Ada16GBAda Lovelace
GpuType.NVIDIA_RTX_A5000NVIDIA RTX A500024GBAmpere
GpuType.NVIDIA_L4NVIDIA L424GBAda Lovelace
GpuType.NVIDIA_GEFORCE_RTX_3090NVIDIA GeForce RTX 309024GBAmpere
GpuType.NVIDIA_GEFORCE_RTX_4090NVIDIA GeForce RTX 409024GBAda Lovelace
GpuType.NVIDIA_GEFORCE_RTX_5090NVIDIA GeForce RTX 509032GBBlackwell
GpuType.NVIDIA_A40NVIDIA A4048GBAmpere
GpuType.NVIDIA_RTX_A6000NVIDIA RTX A600048GBAmpere
GpuType.NVIDIA_RTX_6000_ADA_GENERATIONNVIDIA RTX 6000 Ada48GBAda Lovelace
GpuType.NVIDIA_A100_80GB_PCIeNVIDIA A100 80GB PCIe80GBAmpere
GpuType.NVIDIA_A100_SXM4_80GBNVIDIA A100-SXM4-80GB80GBAmpere
GpuType.NVIDIA_H100_80GB_HBM3NVIDIA H100 80GB HBM380GBHopper
GpuType.NVIDIA_H200NVIDIA H200141GBHopper

Using specific GPU types

from runpod_flash import Endpoint, GpuType

# Single specific GPU
@Endpoint(name="inference", gpu=GpuType.NVIDIA_A100_80GB_PCIe)
async def infer(data: dict) -> dict:
    ...

# Multiple specific GPUs (fallback strategy)
@Endpoint(
    name="flexible",
    gpu=[
        GpuType.NVIDIA_A100_80GB_PCIe,  # Try A100 PCIe first
        GpuType.NVIDIA_A100_SXM4_80GB,  # Fall back to A100 SXM4
        GpuType.NVIDIA_A40              # Final fallback to A40
    ]
)
async def flexible_infer(data: dict) -> dict:
    ...

Advanced fallback strategies

Combine GpuGroup and GpuType for robust availability:
from runpod_flash import Endpoint, GpuGroup, GpuType

@Endpoint(
    name="hybrid-selection",
    gpu=[
        GpuType.NVIDIA_A100_80GB_PCIe,  # Specific GPU first
        GpuGroup.AMPERE_48,             # Pool fallback
        GpuGroup.ANY                    # Ultimate fallback
    ]
)
async def infer(data: dict) -> dict:
    ...

GPU selection behavior

Single GPU type: Flash waits for this specific GPU to become available. Jobs stay in queue until capacity is available.
gpu=GpuGroup.AMPERE_80  # Only A100 80GB
Multiple GPU types (fallback): Flash attempts to provision in the order specified.
gpu=[GpuGroup.AMPERE_80, GpuGroup.AMPERE_48, GpuGroup.ADA_24]
# Tries: A100 → A40/A6000 → RTX 4090
GpuGroup.ANY: Flash selects the first available GPU based on current capacity.
gpu=GpuGroup.ANY  # Fastest provisioning, unpredictable GPU type
For production: Use specific GPU types for predictable cost and performance. For development: Use GpuGroup.ANY for fastest iteration.

Multi-GPU workers

Request multiple GPUs per worker using gpu_count:
@Endpoint(
    name="multi-gpu-training",
    gpu=GpuGroup.AMPERE_80,
    gpu_count=4,  # Each worker gets 4 GPUs
    workers=2     # Maximum 2 workers = 8 GPUs total
)
async def train(data: dict) -> dict:
    ...

Handling unavailability

If requested GPUs are unavailable, jobs stay in queue:
Initial job status: IN_QUEUE
[Waiting for capacity...]
Solutions:
  1. Add fallback options: Use multiple GPU types.
    gpu=[GpuGroup.AMPERE_80, GpuGroup.AMPERE_48, GpuGroup.ADA_24]
    
  2. Use broader selection: Switch to GpuGroup.ANY.
    gpu=GpuGroup.ANY
    
  3. Contact support: For capacity guarantees, contact Runpod support.