Send requests
Learn how to send requests to your endpoints.
Endpoint settings
Configure scaling, timeouts, and GPU selection.
Job states
Monitor job status and metrics.
Model caching
Reduce cold starts with cached models.
Endpoint types
| Queue-based | Load balancing | |
|---|---|---|
| Processing | Requests queued and processed sequentially | Direct HTTP access to workers |
| Execution modes | Async (/run) or sync (/runsync) | Custom HTTP endpoints |
| Retries | Automatic retries on failure | No automatic retries |
| Handler required? | Yes | No (use any HTTP framework) |
| Best for | Batch jobs, guaranteed execution | Real-time apps, streaming |
Create an endpoint
Before creating an endpoint, ensure you have a handler function and Dockerfile.- Web
- REST API
- Navigate to the Serverless section and click New Endpoint.
- Choose your deployment path:
- Hello World: Runpod forks a starter worker template into a new GitHub repo in your account. Choose Queue-based or Load balancing, then click Deploy.
- Hugging Face LLM: Search for any text-generation model on Hugging Face (for example, type “Gemma” to find Gemma 4), select it, and click Create Endpoint. Runpod deploys a vLLM endpoint for you.
- Docker: Deploy from a container image. Select a saved Serverless template to fill in the container configuration automatically, or skip the template and enter an image name manually. See Deploy from Docker.
- GitHub: Select a repository, filtering by code owner if needed. Runpod checks for a Dockerfile and runs a background check on your handler: queue-based endpoints check for handler files, and load balancing endpoints check for a
/pingpath. See Deploy from GitHub. - Hub: Opens the Hub browser, where you can browse and deploy prebuilt workers. This replaces the previous “Ready-to-Deploy Repos” option. See Hub overview.
- Flash: A guided setup flow for Flash that walks you through installing the SDK, initializing your project, and sending your first command. Steps complete automatically as you progress.
- For the GitHub, Docker, and Hello World paths, configure your endpoint before deploying:
- Endpoint name and type (Queue-based or Load balancing)
- GPU configuration and worker scaling
- Model (optional): Enter a Hugging Face URL for cached models
- Environment variables and container configuration. See environment variables.
- Click Deploy Endpoint.
https://api.runpod.ai/v2/{endpoint_id}/
Edit an endpoint
- Navigate to the Serverless section.
- Click the three dots on your endpoint → Edit Endpoint.
- Update endpoint settings and click Save Endpoint.
Delete an endpoint
- Navigate to the Serverless section.
- Click the three dots on your endpoint → Delete Endpoint.
- Type the endpoint name to confirm.