This guide covers the essential management operations for Runpod Serverless endpoints, helping you deploy, configure, and maintain your Serverless applications effectively.
Or select a preconfigured endpoint under Ready-to-Deploy Repos.
Follow the UI steps to configure your selected source (Docker image, GitHub repo), then click Next.
Configure your endpoint settings:
Set the Endpoint Name
Choose your Endpoint Type: select Queue for traditional queue-based processing or Load balancer for direct HTTP access (see Load balancing endpoints for details)
Under GPU Configuration, select the appropriate GPU types and configure worker settings
Set Environment Variables and other options as needed. For a full list of options, see Endpoint configurations
Click Create Endpoint to deploy.
You can optimize cost and availability by specifying GPU preferences in order of priority. Runpod attempts to allocate your first choice GPU. If unavailable, it automatically uses the next GPU in your priority list, ensuring your workloads run on the best available resources.You can enable or disable particular GPU types using the Advanced > Enabled GPU Types section.
After deployment, your endpoint takes time to initialize before it is ready to process requests. You can monitor the deployment status on the endpoint details page, which shows worker status and initialization progress. Once active, your endpoint displays a unique API URL (https://api.runpod.ai/v2/{endpoint_id}/) that you can use to send requests. For information on how to interact with your endpoint, see Endpoint operations.
Changes take effect over time as each worker is updated to the new configuration.
To force an immediate configuration update, temporarily set Max Workers to 0, trigger the Release, then restore your desired worker count and update again.
Click the three dots in the bottom right corner of the endpoint you want to modify.
Click Edit Endpoint.
Expand the Advanced section.
Select a volume from the dropdown below Network Volume.
Click Save Endpoint to attach the volume to your endpoint.
Network volumes are mounted to the same path on each worker, making them ideal for sharing large models, datasets, or any data that needs to persist across worker instances.