Skip to main content
Runpod offers custom pricing plans for large scale and enterprise workloads. Contact our sales team to learn more.
Serverless offers pay-per-second pricing with no upfront costs. You’re billed from when a worker starts until it fully stops, rounded up to the nearest second.

Worker types

Flex workersActive workers
BehaviorScale to zero when idleAlways running (24/7)
PricingStandard per-second rate20–30% discount
Best forVariable workloads, cost optimizationConsistent traffic, low-latency requirements

GPU pricing

GPU type(s)MemoryFlex cost per secondActive cost per secondDescription
A4000, A4500, RTX 400016 GB$0.00016$0.00011The most cost-effective for small models.
4090 PRO24 GB$0.00031$0.00021Extreme throughput for small-to-medium models.
L4, A5000, 309024 GB$0.00019$0.00013Great for small-to-medium sized inference workloads.
L40, L40S, 6000 Ada PRO48 GB$0.00053$0.00037Extreme inference throughput on LLMs like Llama 3 7B.
A6000, A4048 GB$0.00034$0.00024A cost-effective option for running big models.
H100 PRO80 GB$0.00116$0.00093Extreme throughput for big models.
A10080 GB$0.00076$0.00060High throughput GPU, yet still very cost-effective.
H200 PRO141 GB$0.00155$0.00124Extreme throughput for huge models.
B200180 GB$0.00240$0.00190Maximum throughput for huge models.
For the latest pricing, visit the Runpod pricing page.

What you’re billed for

Your total cost includes compute time and storage:
Cost componentDescriptionRate
ComputeGPU time while workers runSee pricing table above
Container diskWorker storage (5-min intervals)~$0.10/GB/month
Network volumeShared persistent storage$0.07/GB/month (< 1TB), $0.05/GB/month (> 1TB)

Compute cost breakdown

Workers incur charges during three phases:
  1. Start time: Initializing the container and loading models into GPU memory. Minimize with FlashBoot or model caching.
  2. Execution time: Processing requests. Set execution timeouts to prevent runaway jobs.
  3. Idle time: Waiting for new requests before scaling down (default: 5 seconds). Configure in endpoint settings.
For high-volume workloads with significant storage needs, use network volumes to share data across workers and reduce per-worker storage costs.

Account limits

Spend limit: Default limit of $80/hour across all resources. Contact support to increase.

Billing support

If you believe you’ve been billed incorrectly, contact support, including the following information in your ticket:
  • Endpoint ID
  • Request ID (if applicable)
  • Approximate time of the issue