> ## Documentation Index
> Fetch the complete documentation index at: https://docs.runpod.io/llms.txt
> Use this file to discover all available pages before exploring further.

# Pricing

> Learn how Serverless billing works to optimize your costs.

<div className="overview-page-wrapper" />

<Tip>
  Runpod offers custom pricing plans for large scale and enterprise workloads. [Contact our sales team](https://ecykq.share.hsforms.com/2MZdZATC3Rb62Dgci7knjbA) to learn more.
</Tip>

Serverless offers pay-per-second pricing with no upfront costs. You're billed from when a worker starts until it fully stops, rounded up to the nearest second.

## Worker types

|              | Flex workers                          | Active workers                               |
| ------------ | ------------------------------------- | -------------------------------------------- |
| **Behavior** | Scale to zero when idle               | Always running (24/7)                        |
| **Pricing**  | Standard per-second rate              | Discounts available through sales inquiry    |
| **Best for** | Variable workloads, cost optimization | Consistent traffic, low-latency requirements |

## GPU pricing

| **GPU type(s)**         | **Memory** | **Flex cost per second** | **Active cost per second** | **Description**                                       |
| ----------------------- | ---------- | ------------------------ | -------------------------- | ----------------------------------------------------- |
| A4000, A4500, RTX 4000  | 16 GB      | \$0.00016                | \$0.00011                  | The most cost-effective for small models.             |
| 4090 PRO                | 24 GB      | \$0.00031                | \$0.00021                  | Extreme throughput for small-to-medium models.        |
| L4, A5000, 3090         | 24 GB      | \$0.00019                | \$0.00013                  | Great for small-to-medium sized inference workloads.  |
| L40, L40S, 6000 Ada PRO | 48 GB      | \$0.00053                | \$0.00037                  | Extreme inference throughput on LLMs like Llama 3 7B. |
| A6000, A40              | 48 GB      | \$0.00034                | \$0.00024                  | A cost-effective option for running big models.       |
| H100 PRO                | 80 GB      | \$0.00116                | \$0.00093                  | Extreme throughput for big models.                    |
| A100                    | 80 GB      | \$0.00076                | \$0.00060                  | High throughput GPU, yet still very cost-effective.   |
| H200 PRO                | 141 GB     | \$0.00155                | \$0.00124                  | Extreme throughput for huge models.                   |
| B200                    | 180 GB     | \$0.00240                | \$0.00190                  | Maximum throughput for huge models.                   |

For the latest pricing, visit the [Runpod pricing page](https://www.runpod.io/pricing).

## What you're billed for

Your total cost includes compute time and storage:

| Cost component     | Description                      | Rate                                              |
| ------------------ | -------------------------------- | ------------------------------------------------- |
| **Compute**        | GPU time while workers run       | See pricing table above                           |
| **Container disk** | Worker storage (5-min intervals) | \~\$0.10/GB/month                                 |
| **Network volume** | Shared persistent storage        | \$0.07/GB/month (\< 1TB), \$0.05/GB/month (> 1TB) |

### Compute cost breakdown

Workers incur charges during three phases:

1. **Start time**: Initializing the container and loading models into GPU memory. Minimize with [FlashBoot](/serverless/endpoints/endpoint-configurations#flashboot) or [model caching](/serverless/endpoints/model-caching).

2. **Execution time**: Processing requests. Set [execution timeouts](/serverless/endpoints/endpoint-configurations#execution-timeout) to prevent runaway jobs.

3. **Idle timeout duration**: The time a worker remains active (running) after completing a request, waiting for additional requests before scaling down (default: 5 seconds). Configure in [endpoint settings](/serverless/endpoints/endpoint-configurations#idle-timeout).

<Tip>
  For high-volume workloads with significant storage needs, use [network volumes](/storage/network-volumes) to share data across workers and reduce per-worker storage costs.
</Tip>

## Account limits

**Spend limit**: Default limit of \$80/hour across all resources. [Contact support](https://www.runpod.io/contact) to increase.

## Billing support

If you believe you've been billed incorrectly, [contact support](https://www.runpod.io/contact), including the following information in your ticket:

* Endpoint ID
* Request ID (if applicable)
* Approximate time of the issue
