Send requests
Learn how to send requests to your endpoints.
Endpoint settings
Configure scaling, timeouts, and GPU selection.
Job states
Monitor job status and metrics.
Model caching
Reduce cold starts with cached models.
Endpoint types
| Queue-based | Load balancing | |
|---|---|---|
| Processing | Requests queued and processed sequentially | Direct HTTP access to workers |
| Execution modes | Async (/run) or sync (/runsync) | Custom HTTP endpoints |
| Retries | Automatic retries on failure | No automatic retries |
| Handler required? | Yes | No (use any HTTP framework) |
| Best for | Batch jobs, guaranteed execution | Real-time apps, streaming |
Create an endpoint
Before creating an endpoint, ensure you have a handler function and Dockerfile.- Web
- REST API
- Navigate to the Serverless section and click New Endpoint.
- Choose your deployment source:
- Import Git Repository: See Deploy from GitHub
- Import from Docker Registry: See Deploy from Docker Hub
- Ready-to-Deploy Repos: Select a preconfigured endpoint
- Configure your endpoint:
- Endpoint Name and Type (Queue-based or Load balancer)
- GPU Configuration and worker settings
- Model (optional): Enter a Hugging Face URL for cached models
- Environment Variables: See environment variables
- Click Deploy Endpoint.
https://api.runpod.ai/v2/{endpoint_id}/
Edit an endpoint
- Navigate to the Serverless section.
- Click the three dots on your endpoint → Edit Endpoint.
- Update endpoint settings and click Save Endpoint.
Delete an endpoint
- Navigate to the Serverless section.
- Click the three dots on your endpoint → Delete Endpoint.
- Type the endpoint name to confirm.