Returns a single endpoint.
Bearer authentication header of the form Bearer <token>, where <token> is your auth token.
ID of endpoint to return.
Include information about the template used to create the endpoint.
true
Include information about the workers running on the endpoint.
true
Successful operation.
A list of acceptable CUDA versions for the workers on a Serverless endpoint. If not set, any CUDA version is acceptable.
12.9, 12.8, 12.7, 12.6, 12.5, 12.4, 12.3, 12.2, 12.1, 12.0, 11.8 The type of compute used by workers on a Serverless endpoint.
CPU, GPU "GPU"
The UTC timestamp when a Serverless endpoint was created.
"2024-07-12T19:14:40.144Z"
A list of Runpod data center IDs where workers on a Serverless endpoint can be located.
EU-RO-1, CA-MTL-1, EU-SE-1, US-IL-1, EUR-IS-1, EU-CZ-1, US-TX-3, EUR-IS-2, US-KS-2, US-GA-2, US-WA-1, US-TX-1, CA-MTL-3, EU-NL-1, US-TX-4, US-CA-2, US-NC-1, OC-AU-1, US-DE-1, EUR-IS-3, CA-MTL-2, AP-JP-1, EUR-NO-1, EU-FR-1, US-KS-3, US-GA-1 "EU-NL-1,EU-RO-1,EU-SE-1"
{ "ENV_VAR": "value" }
The maximum number of milliseconds an individual request can run on a Serverless endpoint before the worker is stopped and the request is marked as failed.
600000
The number of GPUs attached to each worker on a Serverless endpoint.
1
A list of Runpod GPU types which can be attached to a Serverless endpoint.
NVIDIA GeForce RTX 4090, NVIDIA A40, NVIDIA RTX A5000, NVIDIA GeForce RTX 5090, NVIDIA H100 80GB HBM3, NVIDIA GeForce RTX 3090, NVIDIA RTX A4500, NVIDIA L40S, NVIDIA H200, NVIDIA L4, NVIDIA RTX 6000 Ada Generation, NVIDIA A100-SXM4-80GB, NVIDIA RTX 4000 Ada Generation, NVIDIA RTX A6000, NVIDIA A100 80GB PCIe, NVIDIA RTX 2000 Ada Generation, NVIDIA RTX A4000, NVIDIA RTX PRO 6000 Blackwell Server Edition, NVIDIA H100 PCIe, NVIDIA H100 NVL, NVIDIA L40, NVIDIA B200, NVIDIA GeForce RTX 3080 Ti, NVIDIA RTX PRO 6000 Blackwell Workstation Edition, NVIDIA GeForce RTX 3080, NVIDIA GeForce RTX 3070, AMD Instinct MI300X OAM, NVIDIA GeForce RTX 4080 SUPER, Tesla V100-PCIE-16GB, Tesla V100-SXM2-32GB, NVIDIA RTX 5000 Ada Generation, NVIDIA GeForce RTX 4070 Ti, NVIDIA RTX 4000 SFF Ada Generation, NVIDIA GeForce RTX 3090 Ti, NVIDIA RTX A2000, NVIDIA GeForce RTX 4080, NVIDIA A30, NVIDIA GeForce RTX 5080, Tesla V100-FHHL-16GB, NVIDIA H200 NVL, Tesla V100-SXM2-16GB, NVIDIA RTX PRO 6000 Blackwell Max-Q Workstation Edition, NVIDIA A5000 Ada, Tesla V100-PCIE-32GB, NVIDIA RTX A4500, NVIDIA A30, NVIDIA GeForce RTX 3080TI, Tesla T4, NVIDIA RTX A30 A unique string identifying a Serverless endpoint.
"jpnw0v75y3qoql"
The number of seconds a worker on a Serverless endpoint can be running without taking a job before the worker is scaled down.
5
For CPU Serverless endpoints, a list of instance IDs that can be attached to a Serverless endpoint.
["cpu3c-8-16"]
A user-defined name for a Serverless endpoint. The name does not need to be unique.
"my endpoint"
The unique string identifying the network volume to attach to the Serverless endpoint.
"agv6w2qcg7"
A list of network volume IDs attached to the Serverless endpoint. Allows multiple network volumes to be used with multi-region endpoints.
["agv6w2qcg7", "bxh7w3rch8"]
The method used to scale up workers on a Serverless endpoint. If QUEUE_DELAY, workers are scaled based on a periodic check to see if any requests have been in queue for too long. If REQUEST_COUNT, the desired number of workers is periodically calculated based on the number of requests in the endpoint's queue. Use QUEUE_DELAY if you need to ensure requests take no longer than a maximum latency, and use REQUEST_COUNT if you need to scale based on the number of requests.
QUEUE_DELAY, REQUEST_COUNT "QUEUE_DELAY"
If the endpoint scalerType is QUEUE_DELAY, the number of seconds a request can remain in queue before a new worker is scaled up. If the endpoint scalerType is REQUEST_COUNT, the number of workers is increased as needed to meet the number of requests in the endpoint's queue divided by scalerValue.
4
The unique string identifying the template used to create a Serverless endpoint.
"30zmvf89kd"
A unique string identifying the Runpod user who created a Serverless endpoint.
"user_2PyTJrLzeuwfZilRZ7JhCQDuSqo"
The latest version of a Serverless endpoint, which is updated whenever the template or environment variables of the endpoint are changed.
0
Information about current workers on a Serverless endpoint.
The maximum number of workers that can be running at the same time on a Serverless endpoint.
3
The minimum number of workers that will run at the same time on a Serverless endpoint. This number of workers will always stay running for the endpoint, and will be charged even if no requests are being processed, but they are charged at a lower rate than running autoscaling workers.
0