Skip to main content
Endpoints are the foundation of Runpod Serverless, serving as the gateway for deploying and managing your Serverless workers. Each endpoint provides a unique URL that accepts HTTP requests, processes them using your handler function, and returns results.

Endpoint types

Queue-basedLoad balancing
ProcessingRequests queued and processed sequentiallyDirect HTTP access to workers
Execution modesAsync (/run) or sync (/runsync)Custom HTTP endpoints
RetriesAutomatic retries on failureNo automatic retries
Handler required?YesNo (use any HTTP framework)
Best forBatch jobs, guaranteed executionReal-time apps, streaming
Learn more about load balancing endpoints.

Create an endpoint

Before creating an endpoint, ensure you have a handler function and Dockerfile.
  1. Navigate to the Serverless section and click New Endpoint.
  2. Choose your deployment source:
  3. Configure your endpoint:
    • Endpoint Name and Type (Queue-based or Load balancer)
    • GPU Configuration and worker settings
    • Model (optional): Enter a Hugging Face URL for cached models
    • Environment Variables: See environment variables
  4. Click Deploy Endpoint.
Optimize cost and availability by specifying multiple GPU types in priority order. Runpod allocates your first choice if available, otherwise uses the next in your list.
After deployment, your endpoint displays a unique API URL: https://api.runpod.ai/v2/{endpoint_id}/

Edit an endpoint

  1. Navigate to the Serverless section.
  2. Click the three dots on your endpoint → Edit Endpoint.
  3. Update endpoint settings and click Save Endpoint.
Changes to GPU types or worker counts may require restarting active workers.

Delete an endpoint

  1. Navigate to the Serverless section.
  2. Click the three dots on your endpoint → Delete Endpoint.
  3. Type the endpoint name to confirm.
Deleting an endpoint permanently removes all configuration, logs, and job history.