> ## Documentation Index
> Fetch the complete documentation index at: https://docs.runpod.io/llms.txt
> Use this file to discover all available pages before exploring further.

# Overview

> Deploy and manage Serverless endpoints using the Runpod console or REST API.

<div className="overview-page-wrapper" />

Endpoints are the foundation of Runpod Serverless, serving as the gateway for deploying and managing your [Serverless workers](/serverless/workers/overview). Each endpoint provides a unique URL that accepts [HTTP requests](/serverless/endpoints/send-requests), processes them using your [handler function](/serverless/workers/handler-functions), and returns results.

<CardGroup cols={2}>
  <Card title="Send requests" href="/serverless/endpoints/send-requests" icon="paper-plane" horizontal>
    Learn how to send requests to your endpoints.
  </Card>

  <Card title="Endpoint settings" href="/serverless/endpoints/endpoint-configurations" icon="gear" horizontal>
    Configure scaling, timeouts, and GPU selection.
  </Card>

  <Card title="Job states" href="/serverless/endpoints/job-states" icon="chart-line" horizontal>
    Monitor job status and metrics.
  </Card>

  <Card title="Model caching" href="/serverless/endpoints/model-caching" icon="bolt" horizontal>
    Reduce cold starts with cached models.
  </Card>
</CardGroup>

## Endpoint types

|                       | Queue-based                                | Load balancing                |
| --------------------- | ------------------------------------------ | ----------------------------- |
| **Processing**        | Requests queued and processed sequentially | Direct HTTP access to workers |
| **Execution modes**   | Async (`/run`) or sync (`/runsync`)        | Custom HTTP endpoints         |
| **Retries**           | Automatic retries on failure               | No automatic retries          |
| **Handler required?** | Yes                                        | No (use any HTTP framework)   |
| **Best for**          | Batch jobs, guaranteed execution           | Real-time apps, streaming     |

Learn more about [load balancing endpoints](/serverless/load-balancing/overview).

## Create an endpoint

Before creating an endpoint, ensure you have a [handler function](/serverless/workers/handler-functions) and [Dockerfile](/serverless/workers/create-dockerfile).

<Tabs>
  <Tab title="Web">
    1. Navigate to the [Serverless section](https://www.console.runpod.io/serverless) and click **New Endpoint**.
    2. Choose your deployment path:
       * **Hello World**: Runpod forks a starter worker template into a new GitHub repo in your account. Choose Queue-based or Load balancing, then click **Deploy**.
       * **Hugging Face LLM**: Search for any text-generation model on Hugging Face (for example, type "Gemma" to find Gemma 4), select it, and click **Create Endpoint**. Runpod deploys a vLLM endpoint for you.
       * **Docker**: Deploy from a container image. Select a saved Serverless template to fill in the container configuration automatically, or skip the template and enter an image name manually. See [Deploy from Docker](/serverless/workers/deploy).
       * **GitHub**: Select a repository, filtering by code owner if needed. Runpod checks for a Dockerfile and runs a background check on your handler: queue-based endpoints check for handler files, and load balancing endpoints check for a `/ping` path. See [Deploy from GitHub](/serverless/workers/github-integration).
       * **Hub**: Opens the Hub browser, where you can browse and deploy prebuilt workers. This replaces the previous "Ready-to-Deploy Repos" option. See [Hub overview](/hub/overview).
       * **Flash**: A guided setup flow for [Flash](/flash/overview) that walks you through installing the SDK, initializing your project, and sending your first command. Steps complete automatically as you progress.
    3. For the GitHub, Docker, and Hello World paths, configure your endpoint before deploying:
       * **Endpoint name** and **type** ([Queue-based](/flash/create-endpoints#queue-based-endpoints) or [Load balancing](/flash/create-endpoints#load-balanced-endpoints))
       * **GPU** configuration and worker scaling
       * **Model** (optional): Enter a Hugging Face URL for [cached models](/serverless/endpoints/model-caching)
       * **Environment variables** and container configuration. See [environment variables](/serverless/development/environment-variables).
    4. Click **Deploy Endpoint**.
  </Tab>

  <Tab title="REST API">
    ```bash theme={"theme":{"light":"github-light","dark":"github-dark"}}
    curl --request POST \
      --url https://rest.runpod.io/v1/endpoints \
      --header 'Authorization: Bearer RUNPOD_API_KEY' \
      --header 'Content-Type: application/json' \
      --data '{
        "name": "my-endpoint",
        "templateId": "30zmvf89kd",
        "gpuTypeIds": ["NVIDIA GeForce RTX 4090"],
        "workersMin": 0,
        "workersMax": 3,
        "idleTimeout": 5
      }'
    ```

    See the [Endpoint API reference](/api-reference/endpoints/POST/endpoints) for all parameters.
  </Tab>
</Tabs>

<Tip>
  Optimize cost and availability by specifying multiple GPU types in priority order. Runpod allocates your first choice if available, otherwise uses the next in your list.
</Tip>

After deployment, your endpoint displays a unique API URL: `https://api.runpod.ai/v2/{endpoint_id}/`

## Edit an endpoint

1. Navigate to the [Serverless section](https://www.console.runpod.io/serverless).
2. Click the three dots on your endpoint → **Edit Endpoint**.
3. Update [endpoint settings](/serverless/endpoints/endpoint-configurations) and click **Save Endpoint**.

Changes to GPU types or worker counts may require restarting active workers.

## Delete an endpoint

1. Navigate to the [Serverless section](https://www.console.runpod.io/serverless).
2. Click the three dots on your endpoint → **Delete Endpoint**.
3. Type the endpoint name to confirm.

<Warning>
  Deleting an endpoint permanently removes all configuration, logs, and job history.
</Warning>