Skip to main content
The flash run command starts a local development server that lets you test your Flash application before deploying to production. The development server runs locally and updates automatically as you edit files. When you call a @Endpoint function, Flash sends the latest function code to Serverless workers on Runpod, so your changes are reflected immediately. Use flash run when you want to:
  • Iterate quickly with automatic code updates.
  • Test @Endpoint functions against real GPU/CPU workers.
  • Debug request/response handling before deployment.
  • Develop without redeploying after every change.

Start the development server

From inside your project directory, run:
flash run
The server starts at http://localhost:8888 by default. Your endpoints are available immediately for testing, and @Endpoint functions provision Serverless endpoints on first call.

Custom host and port

# Change port
flash run --port 3000

# Make accessible on network
flash run --host 0.0.0.0

Test your endpoints

Using curl

# Call a queue-based endpoint (gpu_worker.py)
curl -X POST http://localhost:8888/gpu_worker/runsync \
  -H "Content-Type: application/json" \
  -d '{"message": "Hello from Flash"}'

# Call a load-balanced endpoint (lb_worker.py)
curl -X POST http://localhost:8888/lb_worker/process \
  -H "Content-Type: application/json" \
  -d '{"data": "test"}'

Using the API explorer

Open http://localhost:8888/docs in your browser to access the interactive Swagger UI. You can test all endpoints directly from the browser.

Using Python

import requests

# Call queue-based endpoint
response = requests.post(
    "http://localhost:8888/gpu_worker/runsync",
    json={"message": "Hello from Flash"}
)
print(response.json())

# Call load-balanced endpoint
response = requests.post(
    "http://localhost:8888/lb_worker/process",
    json={"data": "test"}
)
print(response.json())

Reduce cold-start delays

The first call to a @Endpoint function provisions a Serverless endpoint, which takes 30-60 seconds. Use --auto-provision to provision all endpoints at startup:
flash run --auto-provision
This scans your project for @Endpoint functions and deploys them before the server starts accepting requests. Endpoints are cached in .runpod/resources.pkl and reused across server restarts.

How it works

With flash run, Flash starts a local development server alongside remote Serverless endpoints: What runs where:
ComponentLocation
Development serverYour machine (localhost:8888)
@Endpoint function codeRunpod Serverless
Endpoint storageRunpod Serverless
Your code updates automatically as you edit files. Endpoints created by flash run are prefixed with live- to distinguish them from production endpoints.

Development workflow

A typical development cycle looks like this:
  1. Start the server: flash run
  2. Make changes to your code.
  3. The server reloads automatically.
  4. Test your changes via curl or the API explorer.
  5. Repeat until ready to deploy.
When you’re done, use flash undeploy to clean up the live- endpoints created during development.

Differences from production

Aspectflash runflash deploy
FastAPI app runs onYour machineRunpod Serverless
Endpoint naminglive- prefixNo prefix
Automatic updatesYesNo
AuthenticationNot requiredRequired

Clean up after testing

Endpoints created by flash run persist until you delete them. To clean up:
# List all endpoints
flash undeploy list

# Remove a specific endpoint
flash undeploy ENDPOINT_NAME

# Remove all endpoints
flash undeploy --all

Troubleshooting

Port already in use
flash run --port 3000
Slow first request Use --auto-provision to eliminate cold-start delays:
flash run --auto-provision
Authentication errors Ensure RUNPOD_API_KEY is set in your .env file or environment:
export RUNPOD_API_KEY=your_api_key_here

Next steps