Worker types
| Flex workers | Active workers | |
|---|---|---|
| Behavior | Scale to zero when idle | Always running (24/7) |
| Pricing | Standard per-second rate | 20–30% discount |
| Best for | Variable workloads, cost optimization | Consistent traffic, low-latency requirements |
GPU pricing
| GPU type(s) | Memory | Flex cost per second | Active cost per second | Description |
|---|---|---|---|---|
| A4000, A4500, RTX 4000 | 16 GB | $0.00016 | $0.00011 | The most cost-effective for small models. |
| 4090 PRO | 24 GB | $0.00031 | $0.00021 | Extreme throughput for small-to-medium models. |
| L4, A5000, 3090 | 24 GB | $0.00019 | $0.00013 | Great for small-to-medium sized inference workloads. |
| L40, L40S, 6000 Ada PRO | 48 GB | $0.00053 | $0.00037 | Extreme inference throughput on LLMs like Llama 3 7B. |
| A6000, A40 | 48 GB | $0.00034 | $0.00024 | A cost-effective option for running big models. |
| H100 PRO | 80 GB | $0.00116 | $0.00093 | Extreme throughput for big models. |
| A100 | 80 GB | $0.00076 | $0.00060 | High throughput GPU, yet still very cost-effective. |
| H200 PRO | 141 GB | $0.00155 | $0.00124 | Extreme throughput for huge models. |
| B200 | 180 GB | $0.00240 | $0.00190 | Maximum throughput for huge models. |
What you’re billed for
Your total cost includes compute time and storage:| Cost component | Description | Rate |
|---|---|---|
| Compute | GPU time while workers run | See pricing table above |
| Container disk | Worker storage (5-min intervals) | ~$0.10/GB/month |
| Network volume | Shared persistent storage | $0.07/GB/month (< 1TB), $0.05/GB/month (> 1TB) |
Compute cost breakdown
Workers incur charges during three phases:- Start time: Initializing the container and loading models into GPU memory. Minimize with FlashBoot or model caching.
- Execution time: Processing requests. Set execution timeouts to prevent runaway jobs.
- Idle time: Waiting for new requests before scaling down (default: 5 seconds). Configure in endpoint settings.
Account limits
Spend limit: Default limit of $80/hour across all resources. Contact support to increase.Billing support
If you believe you’ve been billed incorrectly, contact support, including the following information in your ticket:- Endpoint ID
- Request ID (if applicable)
- Approximate time of the issue