GPU type(s) | Memory | Flex cost per second | Active cost per second | Description |
---|---|---|---|---|
A4000, A4500, RTX 4000 | 16 GB | $0.00016 | $0.00011 | The most cost-effective for small models. |
4090 PRO | 24 GB | $0.00031 | $0.00021 | Extreme throughput for small-to-medium models. |
L4, A5000, 3090 | 24 GB | $0.00019 | $0.00013 | Great for small-to-medium sized inference workloads. |
L40, L40S, 6000 Ada PRO | 48 GB | $0.00053 | $0.00037 | Extreme inference throughput on LLMs like Llama 3 7B. |
A6000, A40 | 48 GB | $0.00034 | $0.00024 | A cost-effective option for running big models. |
H100 PRO | 80 GB | $0.00116 | $0.00093 | Extreme throughput for big models. |
A100 | 80 GB | $0.00076 | $0.00060 | High throughput GPU, yet still very cost-effective. |
H200 PRO | 141 GB | $0.00155 | $0.00124 | Extreme throughput for huge models. |
B200 | 180 GB | $0.00240 | $0.00190 | Maximum throughput for huge models. |