Skip to main content
After deploying your Flash app with flash deploy, you can call your endpoints directly via HTTP. The request format depends on whether you’re using queue-based or load-balanced configurations.

Authentication

All deployed endpoints require authentication with your Runpod API key:
export RUNPOD_API_KEY="your_key_here"

curl -X POST https://YOUR_ENDPOINT_URL/path \
    -H "Authorization: Bearer $RUNPOD_API_KEY" \
    -H "Content-Type: application/json" \
    -d '{"param": "value"}'
Your endpoint URLs are displayed after running flash deploy. You can also view them with flash env get <environment-name>.

Queue-based endpoints

Queue-based endpoints (using @Endpoint(name=..., gpu=...) decorator) provide two routes for job submission: /run (asynchronous) and /runsync (synchronous).

Asynchronous calls (/run)

Submit a job and receive a job ID for later status checking:
curl -X POST https://api.runpod.ai/v2/abc123xyz/run \
    -H "Authorization: Bearer $RUNPOD_API_KEY" \
    -H "Content-Type: application/json" \
    -d '{"input": {"prompt": "Hello world"}}'
Response:
{
    "id": "job-abc-123",
    "status": "IN_QUEUE"
}
Check status later:
curl https://api.runpod.ai/v2/abc123xyz/status/job-abc-123 \
    -H "Authorization: Bearer $RUNPOD_API_KEY"
When job completes:
{
    "id": "job-abc-123",
    "status": "COMPLETED",
    "output": {
        "generated_text": "Hello world from GPU!"
    }
}

Synchronous calls (/runsync)

Wait for job completion and receive results directly (with timeout):
curl -X POST https://api.runpod.ai/v2/abc123xyz/runsync \
    -H "Authorization: Bearer $RUNPOD_API_KEY" \
    -H "Content-Type: application/json" \
    -d '{"input": {"prompt": "Hello world"}}'
Response (after job completes):
{
    "id": "job-abc-123",
    "status": "COMPLETED",
    "output": {
        "generated_text": "Hello world from GPU!"
    }
}
Use /run for long-running jobs that you’ll check later. Use /runsync for quick jobs where you want immediate results (with timeout protection).

Queue-based request format

Queue-based endpoints expect input wrapped in an {"input": {...}} object:
curl -X POST https://api.runpod.ai/v2/abc123xyz/runsync \
    -H "Authorization: Bearer $RUNPOD_API_KEY" \
    -H "Content-Type: application/json" \
    -d '{
        "input": {
            "param1": "value1",
            "param2": "value2"
        }
    }'
The structure inside "input" depends on your @Endpoint function signature.

Job status states

StatusDescription
IN_QUEUEWaiting for an available worker
IN_PROGRESSWorker is executing your function
COMPLETEDFunction finished successfully
FAILEDExecution encountered an error

Load-balanced endpoints

Load-balanced endpoints (using api = Endpoint(...); @api.post("/path") pattern) provide custom HTTP routes with direct request/response patterns.

Calling load-balanced routes

All routes share the same base URL. Append the route path to call specific functions:
# POST route
curl -X POST https://abc123xyz.api.runpod.ai/analyze \
    -H "Authorization: Bearer $RUNPOD_API_KEY" \
    -H "Content-Type: application/json" \
    -d '{"text": "Hello world from Flash"}'

# GET route
curl -X GET https://abc123xyz.api.runpod.ai/info \
    -H "Authorization: Bearer $RUNPOD_API_KEY"

# Another POST route (same endpoint URL)
curl -X POST https://abc123xyz.api.runpod.ai/validate \
    -H "Authorization: Bearer $RUNPOD_API_KEY" \
    -H "Content-Type: application/json" \
    -d '{"name": "Alice", "email": "alice@example.com"}'

Load-balanced request format

Load-balanced endpoints accept direct JSON payloads (no {"input": {...}} wrapper):
curl -X POST https://abc123xyz.api.runpod.ai/process \
    -H "Authorization: Bearer $RUNPOD_API_KEY" \
    -H "Content-Type: application/json" \
    -d '{
        "param1": "value1",
        "param2": "value2"
    }'
The payload structure depends on your function signature. Each route can accept different parameters.

Multiple routes, single endpoint

A single load-balanced endpoint can serve multiple routes:
from runpod_flash import Endpoint

api = Endpoint(name="api-server", cpu="cpu5c-4-8", workers=(1, 5))

# All these routes share one endpoint URL
@api.post("/generate")
async def generate_text(prompt: str): ...

@api.post("/translate")
async def translate_text(text: str): ...

@api.get("/health")
async def health_check(): ...
# All use the same base URL with different paths
curl -X POST https://abc123xyz.api.runpod.ai/generate -H "..." -d '{...}'
curl -X POST https://abc123xyz.api.runpod.ai/translate -H "..." -d '{...}'
curl -X GET https://abc123xyz.api.runpod.ai/health -H "..."

Quick reference

Endpoint TypeRoutesRequest FormatResponse
Queue-based/run, /runsync, /status/{id}{"input": {...}}Job ID (async) or result (sync)
Load-balancedCustom paths (e.g., /process)Direct JSON payloadDirect response

Response status codes

CodeMeaning
200Success (load-balanced) or job accepted (queue-based)
400Bad request (invalid input format)
401Unauthorized (invalid or missing API key)
404Route not found
500Internal server error

Error handling

Queue-based errors appear in the job output:
{
    "id": "job-abc-123",
    "status": "FAILED",
    "error": "Error message from your function"
}
Load-balanced errors return HTTP error codes with JSON body:
{
    "error": "Error message from your function",
    "detail": "Additional error context"
}

Using SDKs

For programmatic access, use the Runpod Python SDK:
import runpod

# Set API key
runpod.api_key = "your_api_key"

# Connect to endpoint
endpoint = runpod.Endpoint("YOUR_ENDPOINT_ID")

# Async call (returns job object immediately)
run_request = endpoint.run({"prompt": "Hello world"})
status = run_request.status()  # Check status
output = run_request.output()  # Get result once complete

# Sync call (blocks until complete)
result = endpoint.run_sync({"prompt": "Hello world"})
See the Runpod SDK documentation for complete SDK usage.

Next steps