Skip to main content
Endpoints are the foundation of Runpod Serverless, serving as the gateway for deploying and managing your Serverless workers. Each endpoint provides a unique URL that accepts HTTP requests, processes them using your handler function, and returns results.

Send requests

Learn how to send requests to your endpoints.

Endpoint settings

Configure scaling, timeouts, and GPU selection.

Job states

Monitor job status and metrics.

Model caching

Reduce cold starts with cached models.

Endpoint types

Queue-basedLoad balancing
ProcessingRequests queued and processed sequentiallyDirect HTTP access to workers
Execution modesAsync (/run) or sync (/runsync)Custom HTTP endpoints
RetriesAutomatic retries on failureNo automatic retries
Handler required?YesNo (use any HTTP framework)
Best forBatch jobs, guaranteed executionReal-time apps, streaming
Learn more about load balancing endpoints.

Create an endpoint

Before creating an endpoint, ensure you have a handler function and Dockerfile.
  1. Navigate to the Serverless section and click New Endpoint.
  2. Choose your deployment path:
    • Hello World: Runpod forks a starter worker template into a new GitHub repo in your account. Choose Queue-based or Load balancing, then click Deploy.
    • Hugging Face LLM: Search for any text-generation model on Hugging Face (for example, type “Gemma” to find Gemma 4), select it, and click Create Endpoint. Runpod deploys a vLLM endpoint for you.
    • Docker: Deploy from a container image. Select a saved Serverless template to fill in the container configuration automatically, or skip the template and enter an image name manually. See Deploy from Docker.
    • GitHub: Select a repository, filtering by code owner if needed. Runpod checks for a Dockerfile and runs a background check on your handler: queue-based endpoints check for handler files, and load balancing endpoints check for a /ping path. See Deploy from GitHub.
    • Hub: Opens the Hub browser, where you can browse and deploy prebuilt workers. This replaces the previous “Ready-to-Deploy Repos” option. See Hub overview.
    • Flash: A guided setup flow for Flash that walks you through installing the SDK, initializing your project, and sending your first command. Steps complete automatically as you progress.
  3. For the GitHub, Docker, and Hello World paths, configure your endpoint before deploying:
  4. Click Deploy Endpoint.
Optimize cost and availability by specifying multiple GPU types in priority order. Runpod allocates your first choice if available, otherwise uses the next in your list.
After deployment, your endpoint displays a unique API URL: https://api.runpod.ai/v2/{endpoint_id}/

Edit an endpoint

  1. Navigate to the Serverless section.
  2. Click the three dots on your endpoint → Edit Endpoint.
  3. Update endpoint settings and click Save Endpoint.
Changes to GPU types or worker counts may require restarting active workers.

Delete an endpoint

  1. Navigate to the Serverless section.
  2. Click the three dots on your endpoint → Delete Endpoint.
  3. Type the endpoint name to confirm.
Deleting an endpoint permanently removes all configuration, logs, and job history.