Overview

Runpod offers custom Instant Cluster pricing plans for large scale and enterprise workloads. If you’re interested in learning more, contact our sales team.

Instant Clusters enable high-performance computing across multiple GPU Pods, with high-speed networking capabilities. Instant Clusters provide:

Fast local networking between Pods, with bandwidths from 100 Gbps to 3200 Gbps within a single data center.
Static IP assignment for each Pod in the cluster.
Automatic assignment of environment variables for seamless coordination between Pods.

All accounts have a default spending limit. To deploy a larger cluster, submit a support ticket at help@runpod.io.

Get started

Get started with Instant Clusters by following a step-by-step tutorial for your preferred framework:

Use cases for Instant Clusters

Instant Clusters provide powerful computing capabilities that benefit a wide range of applications:

Deep learning & AI

Large Language Model training: Distribute training of models across multiple GPUs for significantly faster convergence.
Federated Learning: Train models across distributed systems while preserving data privacy and security.

High-performance computing

Scientific simulations: Use multi-GPU acceleration to run complex simulations for weather forecasting, molecular dynamics, and climate modeling.
Computational physics: Solve large-scale physics problems requiring massive parallel computing power.
Fluid dynamics & engineering: Perform fluid dynamics computations for use in aerospace, automotive, and energy sectors.

Graphics computing & rendering

Large-scale rendering: Generate high-fidelity images and animations for film, gaming, and visualization.
Real-time graphics processing: Power complex visual effects and simulations requiring multiple GPUs.
Game development & testing: Render game environments, test AI-driven behaviors, and generate procedural content.
Virtual reality & augmented reality: Deliver real-time multi-view rendering for immersive AR/VR experiences.

Large-scale data analytics

Big data processing: Analyze large-scale datasets with distributed computing frameworks.
Social media analysis: Detect real-time trends, analyze sentiment, and identify misinformation.

Network interfaces

High-bandwidth interfaces (eth1, eth2, etc.) handle communication between Pods, while the management interface (eth0) manages external traffic. The NCCL environment variable NCCL_SOCKET_IFNAME uses all available interfaces by default. The PRIMARY_ADDR corresponds to eth1 to enable launching and bootstrapping distributed processes. Instant Clusters support up to 8 interfaces per Pod. Each interface (eth1 - eth8) provides a private network connection for inter-node communication, made available to distributed backends such as NCCL or GLOO.

Environment variables

The following environment variables are present in all Pods on an Instant Cluster:

Environment Variable	Description
`PRIMARY_ADDR` / `MASTER_ADDR`	The address of the primary Pod.
`PRIMARY_PORT` / `MASTER_PORT`	The port of the primary Pod (all ports are available).
`NODE_ADDR`	The static IP of this Pod within the cluster network.
`NODE_RANK`	The cluster (i.e., global) rank assigned to this Pod (0 for the primary Pod).
`NUM_NODES`	The number of Pods in the cluster.
`NUM_TRAINERS`	The number of GPUs per Pod.
`HOST_NODE_ADDR`	Defined as `PRIMARY_ADDR:PRIMARY_PORT` for convenience.
`WORLD_SIZE`	The total number of GPUs in the cluster (`NUM_NODES` * `NUM_TRAINERS`).

Each Pod receives a static IP (NODE_ADDR) on the overlay network. When a cluster is deployed, the system designates one Pod as the primary node by setting the PRIMARY_ADDR and PRIMARY_PORT environment variables. This simplifies working with multiprocessing libraries that require a primary node. The variables MASTER_ADDR/PRIMARY_ADDR and MASTER_PORT/PRIMARY_PORT are equivalent. The MASTER_* variables provide compatibility with tools that expect these legacy names.

Get started

Serverless

Hub

Pods

Instant Clusters

Fine-tuning

Integrations

Hosting

References

Get started

Use cases for Instant Clusters

Deep learning & AI

High-performance computing

Graphics computing & rendering

Large-scale data analytics

Network interfaces

Environment variables

Get started

Serverless

Hub

Pods

Instant Clusters

Fine-tuning

Integrations

Hosting

References

​Get started

​Use cases for Instant Clusters

​Deep learning & AI

​High-performance computing

​Graphics computing & rendering

​Large-scale data analytics

​Network interfaces

​Environment variables

Get started

Use cases for Instant Clusters

Deep learning & AI

High-performance computing

Graphics computing & rendering

Large-scale data analytics

Network interfaces

Environment variables