run

Start the Flash development server for local testing with automatic updates. A local development server provides a unified interface for testing while @Endpoint functions execute on Runpod Serverless.

flash run [OPTIONS]

Example

Start the development server with defaults:

flash run

Start with auto-provisioning to eliminate cold-start delays:

flash run --auto-provision

Start on a custom port:

flash run --port 3000

Flags

--host

string

default:"localhost"

Host address to bind the server to.

--port, -p

integer

default:8888

Port number for the server. If the port is already in use, Flash automatically tries the next available port.

--reload/--no-reload

default:"enabled"

Enable or disable auto-reload on code changes. Enabled by default.

--auto-provision

Auto-provision all Serverless endpoints on startup instead of lazily on first call. Eliminates cold-start delays during development.

Endpoint descriptions from docstrings

Flash extracts the first line of each function’s docstring and uses it in two places:

Startup table: The “Description” column shows the docstring when the server starts.
Swagger UI: The endpoint summary in the API explorer at /docs.

Add docstrings to your @Endpoint functions to make your API self-documenting:

@Endpoint(name="text-processor", gpu=GpuGroup.ANY)
def analyze_text(text: str) -> dict:
    """Analyze text and return sentiment scores."""
    # Implementation here
    return {"sentiment": "positive"}

When you run flash run, the startup table displays “Analyze text and return sentiment scores” as the description for this endpoint, and the same text appears in the Swagger UI summary.

Architecture

With flash run, Flash starts a local development server alongside remote Serverless endpoints: Key points:

A local development server provides a convenient testing interface at localhost:8888.
@Endpoint functions deploy to Runpod Serverless with live- prefix to distinguish from production.
Code changes are picked up automatically without restarting the server.
The development server routes requests to appropriate remote endpoints.

This differs from flash deploy, where all endpoints run on Runpod without a local server.

Auto-provisioning

By default, endpoints are provisioned lazily on first @Endpoint function call. Use --auto-provision to provision all endpoints at server startup:

flash run --auto-provision

How it works

Discovery: Scans your app for @Endpoint decorated functions.
Deployment: Deploys resources concurrently (up to 3 at a time).
Confirmation: Asks for confirmation if deploying more than 5 endpoints.
Caching: Stores deployed resources in .flash/resources.pkl for reuse.
Updates: Recognizes existing endpoints and updates if configuration changed.

Benefits

Zero cold start: All endpoints ready before you test them.
Faster development: No waiting for deployment on first HTTP call.
Resource reuse: Cached endpoints are reused across server restarts.

When to use

Local development with multiple endpoints.
Testing workflows that call multiple remote functions.
Debugging where you want deployment separated from handler logic.

Provisioning modes

Mode	When endpoints are deployed
Default (lazy)	On first `@Endpoint` function call
`--auto-provision`	At server startup

Testing your API

Once the server is running, test your endpoints:

# Health check
curl http://localhost:8888/

# Call a queue-based GPU endpoint (gpu_worker.py)
curl -X POST http://localhost:8888/gpu_worker/runsync \
  -H "Content-Type: application/json" \
  -d '{"input": {"input_data": {"message": "Hello from the GPU"}}}'

# Call a load-balanced endpoint (lb_worker.py)
curl -X POST http://localhost:8888/lb_worker/process \
  -H "Content-Type: application/json" \
  -d '{"input_data": {"message": "Hello from Flash"}}'

Queue-based endpoints require the {"input": {...}} wrapper format to match deployed endpoint behavior. Load-balanced endpoints accept direct JSON payloads.

Open http://localhost:8888/docs for the interactive API explorer.

Requirements

RUNPOD_API_KEY must be set in your .env file or environment.
A valid Flash project structure (created by flash init or manually).

flash run vs flash deploy

Aspect	`flash run`	`flash deploy`
Local development server	Yes (http://localhost:8888)	No
`@Endpoint` functions run on	Runpod Serverless	Runpod Serverless
Endpoint persistence	Temporary (`live-` prefix)	Persistent
Code updates	Automatic reload	Manual redeploy
Use case	Development	Production

flash init - Create a new project
flash deploy - Deploy to production
flash undeploy - Remove endpoints

Flash CLI

Runpod CLI

Example

Flags

Endpoint descriptions from docstrings

Architecture

Auto-provisioning

How it works

Benefits

When to use

Provisioning modes

Testing your API

Requirements

flash run vs flash deploy

Flash CLI

Runpod CLI

​Example

​Flags

​Endpoint descriptions from docstrings

​Architecture

​Auto-provisioning

​How it works

​Benefits

​When to use

​Provisioning modes

​Testing your API

​Requirements

​flash run vs flash deploy

​Related commands

Example

Flags

Endpoint descriptions from docstrings

Architecture

Auto-provisioning

How it works

Benefits

When to use

Provisioning modes

Testing your API

Requirements

flash run vs flash deploy

Related commands