Skip to main content
Start the Flash development server for local testing with automatic updates. A local development server provides a unified interface for testing while @Endpoint functions execute on Runpod Serverless.
flash run [OPTIONS]

Example

Start the development server with defaults:
flash run
Start with auto-provisioning to eliminate cold-start delays:
flash run --auto-provision
Start on a custom port:
flash run --port 3000

Flags

--host
string
default:"localhost"
Host address to bind the server to.
--port, -p
integer
default:8888
Port number to bind the server to.
--reload/--no-reload
default:"enabled"
Enable or disable auto-reload on code changes. Enabled by default.
--auto-provision
Auto-provision all Serverless endpoints on startup instead of lazily on first call. Eliminates cold-start delays during development.

Architecture

With flash run, Flash starts a local development server alongside remote Serverless endpoints: Key points:
  • A local development server provides a convenient testing interface at localhost:8888.
  • @Endpoint functions deploy to Runpod Serverless with live- prefix to distinguish from production.
  • Code changes are picked up automatically without restarting the server.
  • The development server routes requests to appropriate remote endpoints.
This differs from flash deploy, where all endpoints run on Runpod without a local server.

Auto-provisioning

By default, endpoints are provisioned lazily on first @Endpoint function call. Use --auto-provision to provision all endpoints at server startup:
flash run --auto-provision

How it works

  1. Discovery: Scans your app for @Endpoint decorated functions.
  2. Deployment: Deploys resources concurrently (up to 3 at a time).
  3. Confirmation: Asks for confirmation if deploying more than 5 endpoints.
  4. Caching: Stores deployed resources in .runpod/resources.pkl for reuse.
  5. Updates: Recognizes existing endpoints and updates if configuration changed.

Benefits

  • Zero cold start: All endpoints ready before you test them.
  • Faster development: No waiting for deployment on first HTTP call.
  • Resource reuse: Cached endpoints are reused across server restarts.

When to use

  • Local development with multiple endpoints.
  • Testing workflows that call multiple remote functions.
  • Debugging where you want deployment separated from handler logic.

Provisioning modes

ModeWhen endpoints are deployed
Default (lazy)On first @Endpoint function call
--auto-provisionAt server startup

Testing your API

Once the server is running, test your endpoints:
# Health check
curl http://localhost:8888/

# Call a queue-based GPU endpoint (gpu_worker.py)
curl -X POST http://localhost:8888/gpu_worker/runsync \
  -H "Content-Type: application/json" \
  -d '{"message": "Hello from GPU!"}'

# Call a load-balanced endpoint (lb_worker.py)
curl -X POST http://localhost:8888/lb_worker/process \
  -H "Content-Type: application/json" \
  -d '{"data": "test"}'
Open http://localhost:8888/docs for the interactive API explorer.

Requirements

  • RUNPOD_API_KEY must be set in your .env file or environment.
  • A valid Flash project structure (created by flash init or manually).

flash run vs flash deploy

Aspectflash runflash deploy
Local development serverYes (http://localhost:8888)No
@Endpoint functions run onRunpod ServerlessRunpod Serverless
Endpoint persistenceTemporary (live- prefix)Persistent
Code updatesAutomatic reloadManual redeploy
Use caseDevelopmentProduction