Endpoint types
Queue-based endpoints
Queue-based endpoints are the traditional type of endpoint that process requests sequentially in a queue (managed automatically by Runpod), providing guaranteed execution and automatic retries for failed requests. Queue-based endpoints offer two types execution modes:- Asynchronous processing via the
/runendpoint operation, which lets you submit jobs that run in the background and check results later (with/status), making this ideal for long-running tasks. - Synchronous operations through the
/runsyncendpoint operation, allowing you to receive immediate results in the same request, which is perfect for interactive applications.
Load balancing endpoints
Load balancing endpoints offer direct HTTP access to your worker’s HTTP server, bypassing the queueing system. These are ideal for real-time applications and streaming, but provide no queuing mechanism for request backlog (similar to UDP’s behavior in networking). Load balancing endpoints don’t require a handler function, allowing you to define your own custom API endpoints using any HTTP framework (like FastAPI or Flask). To learn more, see the Load balancing endpoints page.Key features
Auto-scaling
Runpod endpoints (both queue-based and load balancing) can automatically scale from zero to hundreds of workers based on demand. You can customize your endpoint configuration to adjust the minimum and maximum worker count, GPU allocation, and memory settings. The system also offers GPU prioritization, allowing you to specify preferred GPU types in order of priority. To learn more, see Endpoint settings.Integration options
Runpod endpoints support webhook notifications, allowing you to configure endpoints to call your webhook when jobs complete. It also includes S3-compatible storage integration for working with object storage for larger inputs and outputs.Create an endpoint
Before creating an endpoint make sure you have a working handler function and Dockerfile.- Web
- REST API
To create a new Serverless endpoint through the Runpod web interface:
- Navigate to the Serverless section of the Runpod console.
- Click New Endpoint.
- On the Deploy a New Serverless Endpoint screen, choose your deployment source:
- Import Git Repository (if GitHub is connected). See Deploy from GitHub for details.
- Import from Docker Registry. See Deploy from Docker Hub for details.
- Or select a preconfigured endpoint under Ready-to-Deploy Repos.
- Follow the UI steps to configure your selected source (Docker image, GitHub repo), then click Next.
- Configure your endpoint settings:
- Endpoint Name: The display name for your endpoint in the console.
- Endpoint Type: Select Queue for traditional queue-based processing or Load balancer for direct HTTP access. See Load balancing endpoints for details.
- GPU Configuration: Select the appropriate GPU types and configure worker settings.
- Model: (Optional) Enter a model URL from Hugging Face to optimize worker startup times. See Cached models for details.
- Container Configuration: Edit the container start command, specify the container disk size, and expose HTTP/TCP ports.
- Environment Variables: Add environment variables for your worker containers.
- Click Deploy Endpoint to deploy.
https://api.runpod.ai/v2/{endpoint_id}/) that you can use to send requests.
Edit an endpoint
You can modify your endpoint’s configuration at any time:- Navigate to the Serverless section in the Runpod console.
- Click the three dots in the top right corner of the endpoint you want to modify.
- Click Edit Endpoint.
- Update any endpoint settings as needed.
- Click Save Endpoint to save your changes.
Delete an endpoint
To delete an endpoint:- Navigate to the Serverless section in the Runpod console.
- Click the three dots in the top right corner of the endpoint you want to delete.
- Click Delete Endpoint.
- Type the name of the endpoint, then click Confirm.