Skip to main content
This reference covers all operations available for queue-based endpoints. For conceptual information and advanced options, see Send API requests.

Setup

Before running these examples, install the Runpod SDK:
# Python
python -m pip install runpod

# JavaScript
npm install --save runpod-sdk

# Go
go get github.com/runpod/go-sdk && go mod tidy
Set your API key and endpoint ID as environment variables:
export RUNPOD_API_KEY="YOUR_API_KEY"
export ENDPOINT_ID="YOUR_ENDPOINT_ID"
You can also send requests using standard HTTP libraries like fetch (JavaScript) and requests (Python).

/runsync

Synchronous jobs wait for completion and return the complete result in a single response. Best for shorter tasks, interactive applications, and simpler client code without status polling.
  • Maximum payload size: 20 MB
  • Result retention: 1 minute after completion
  • Default wait time: 90 seconds (adjustable via ?wait=x parameter, 1000-300000 ms)
https://api.runpod.ai/v2/$ENDPOINT_ID/runsync?wait=120000
The ?wait parameter controls how long the request waits for job completion, not how long results are retained.
curl --request POST \
     --url https://api.runpod.ai/v2/$ENDPOINT_ID/runsync \
     -H "accept: application/json" \
     -H "authorization: $RUNPOD_API_KEY" \
     -H "content-type: application/json" \
     -d '{ "input": {  "prompt": "Hello, world!" }}'
Response:
{
  "delayTime": 824,
  "executionTime": 3391,
  "id": "sync-79164ff4-d212-44bc-9fe3-389e199a5c15",
  "output": [
    {
      "image": "https://image.url",
      "seed": 46578
    }
  ],
  "status": "COMPLETED"
}

/run

Asynchronous jobs process in the background and return immediately with a job ID. Best for longer-running tasks, operations requiring significant processing time, and managing multiple concurrent jobs.
  • Maximum payload size: 10 MB
  • Result retention: 30 minutes after completion
curl --request POST \
     --url https://api.runpod.ai/v2/$ENDPOINT_ID/run \
     -H "accept: application/json" \
     -H "authorization: $RUNPOD_API_KEY" \
     -H "content-type: application/json" \
    -d '{"input": {"prompt": "Hello, world!"}}'
Response:
{
  "id": "eaebd6e7-6a92-4bb8-a911-f996ac5ea99d",
  "status": "IN_QUEUE"
}
Retrieve results using the /status operation.

/status

Check the current state, execution statistics, and results of previously submitted jobs.
Configure time-to-live (TTL) for individual jobs by appending ?ttl=x to the request URL. For example, ?ttl=6000 sets the TTL to 6 seconds.
Replace YOUR_JOB_ID with the job ID from your /run response.
curl --request GET \
     --url https://api.runpod.ai/v2/$ENDPOINT_ID/status/YOUR_JOB_ID \
     -H "authorization: $RUNPOD_API_KEY" \
Response: Returns job status (IN_QUEUE, IN_PROGRESS, COMPLETED, FAILED) with optional output field:
{
  "delayTime": 31618,
  "executionTime": 1437,
  "id": "60902e6c-08a1-426e-9cb9-9eaec90f5e2b-u1",
  "output": {
    "input_tokens": 22,
    "output_tokens": 16,
    "text": ["Hello! How can I assist you today?\nUSER: I'm having"]
  },
  "status": "COMPLETED"
}

/stream

Receive incremental results as they become available from jobs that generate output progressively. Best for text generation, long-running jobs where you want to show progress, and large outputs that benefit from incremental processing. Your handler must support streaming. See Streaming handlers for implementation details.
Replace YOUR_JOB_ID with the job ID from your /run response.
curl --request GET \
     --url https://api.runpod.ai/v2/$ENDPOINT_ID/stream/YOUR_JOB_ID \
     -H "accept: application/json" \
     -H "authorization: $RUNPOD_API_KEY" \
Maximum size for a single streamed payload chunk is 1 MB. Larger outputs are split across multiple chunks.
Response:
[
  {
    "metrics": {
      "avg_gen_throughput": 0,
      "avg_prompt_throughput": 0,
      "cpu_kv_cache_usage": 0,
      "gpu_kv_cache_usage": 0.0016722408026755853,
      "input_tokens": 0,
      "output_tokens": 1,
      "pending": 0,
      "running": 1,
      "scenario": "stream",
      "stream_index": 2,
      "swapped": 0
    },
    "output": {
      "input_tokens": 0,
      "output_tokens": 1,
      "text": [" How"]
    }
  }
]

/cancel

Stop jobs that are no longer needed or taking too long. Stops in-progress jobs, removes queued jobs before they start, and returns immediately with the canceled status.
Replace YOUR_JOB_ID with the job ID from your /run response.
curl --request POST \
  --url https://api.runpod.ai/v2/$ENDPOINT_ID/cancel/YOUR_JOB_ID \
  -H "authorization: $RUNPOD_API_KEY" \
Response:
{
  "id": "724907fe-7bcc-4e42-998d-52cb93e1421f-u1",
  "status": "CANCELLED"
}

/retry

Requeue jobs that have failed or timed out without submitting a new request. Maintains the same job ID, requeues with original input parameters, and removes previous output. Only works for jobs with FAILED or TIMED_OUT status. Replace YOUR_JOB_ID with the job ID from your /run response.
curl --request POST \
     --url https://api.runpod.ai/v2/$ENDPOINT_ID/retry/YOUR_JOB_ID \
     -H "authorization: $RUNPOD_API_KEY"
Response:
{
  "id": "60902e6c-08a1-426e-9cb9-9eaec90f5e2b-u1",
  "status": "IN_QUEUE"
}
Job results expire after a set period. Async jobs (/run) results are available for 30 minutes, sync jobs (/runsync) for 1 minute (up to 5 minutes with ?wait=t). Once expired, jobs cannot be retried.

/purge-queue

Remove all pending jobs from the queue. Useful for error recovery, clearing outdated requests, and resetting after configuration changes.
curl --request POST \
     --url https://api.runpod.ai/v2/$ENDPOINT_ID/purge-queue \
     -H "authorization: $RUNPOD_API_KEY"
This operation only affects jobs waiting in the queue. Jobs already in progress continue to run.
Response:
{
  "removed": 2,
  "status": "completed"
}

/health

Get a quick overview of your endpoint’s operational status including worker availability and job queue status.
curl --request GET \
     --url https://api.runpod.ai/v2/$ENDPOINT_ID/health \
     -H "authorization: $RUNPOD_API_KEY"
Response:
{
  "jobs": {
    "completed": 1,
    "failed": 5,
    "inProgress": 0,
    "inQueue": 2,
    "retried": 0
  },
  "workers": {
    "idle": 0,
    "running": 0
  }
}