> ## Documentation Index
> Fetch the complete documentation index at: https://docs.runpod.io/llms.txt
> Use this file to discover all available pages before exploring further.

# dev

Start the Flash development server for local testing with automatic updates. A local development server provides a unified interface for testing while `@Endpoint` functions execute on Runpod Serverless.

```bash theme={"theme":{"light":"github-light","dark":"github-dark"}}
flash dev [OPTIONS]
```

<Note>
  `flash run` is a hidden alias for `flash dev` and works identically. New projects should use `flash dev`.
</Note>

## Example

Start the development server with defaults:

```bash theme={"theme":{"light":"github-light","dark":"github-dark"}}
flash dev
```

Start with auto-provisioning to eliminate cold-start delays:

```bash theme={"theme":{"light":"github-light","dark":"github-dark"}}
flash dev --auto-provision
```

Start on a custom port:

```bash theme={"theme":{"light":"github-light","dark":"github-dark"}}
flash dev --port 3000
```

## Flags

<ResponseField name="--host" type="string" default="localhost">
  Host address to bind the server to.
</ResponseField>

<ResponseField name="--port, -p" type="integer" default={8888}>
  Port number for the server. If the port is already in use, Flash automatically tries the next available port.
</ResponseField>

<ResponseField name="--reload/--no-reload" default="enabled">
  Enable or disable auto-reload on code changes. Enabled by default.
</ResponseField>

<ResponseField name="--auto-provision">
  Auto-provision all Serverless endpoints on startup instead of lazily on first call. Eliminates cold-start delays during development.
</ResponseField>

## Endpoint descriptions from docstrings

Flash extracts the first line of each function's docstring and uses it in two places:

* **Startup table**: The "Description" column shows the docstring when the server starts.
* **Swagger UI**: The endpoint summary in the API explorer at `/docs`.

Add docstrings to your `@Endpoint` functions to make your API self-documenting:

```python theme={"theme":{"light":"github-light","dark":"github-dark"}}
@Endpoint(name="text-processor", gpu=GpuGroup.ANY)
def analyze_text(text: str) -> dict:
    """Analyze text and return sentiment scores."""
    # Implementation here
    return {"sentiment": "positive"}
```

When you run `flash dev`, the startup table displays "Analyze text and return sentiment scores" as the description for this endpoint, and the same text appears in the Swagger UI summary.

## Architecture

With `flash dev`, Flash starts a local development server alongside remote Serverless endpoints:

```mermaid theme={"theme":{"light":"github-light","dark":"github-dark"}}
%%{init: {'theme':'base', 'themeVariables': { 'primaryColor':'#9289FE','primaryTextColor':'#fff','primaryBorderColor':'#9289FE','lineColor':'#5F4CFE','secondaryColor':'#AE6DFF','tertiaryColor':'#FCB1FF','edgeLabelBackground':'#5F4CFE', 'fontSize':'14px','fontFamily':'font-inter'}}}%%

flowchart TB
    Browser(["BROWSER/CURL"])

    subgraph Local ["YOUR MACHINE (localhost:8888)"]
        DevServer["Development Server<br/>• Auto-reload on changes<br/>• API explorer at /docs<br/>• Routes requests"]
    end

    subgraph Runpod ["RUNPOD SERVERLESS"]
        LB["live-lb_worker"]
        GPU["live-gpu_worker"]
        CPU["live-cpu_worker"]
    end

    Browser -->|"HTTP"| DevServer
    DevServer -->|"HTTPS"| LB
    DevServer -->|"HTTPS"| GPU
    DevServer -->|"HTTPS"| CPU

    style Local fill:#1a1a2e,stroke:#5F4CFE,stroke-width:2px,color:#fff
    style Runpod fill:#1a1a2e,stroke:#5F4CFE,stroke-width:2px,color:#fff
    style Browser fill:#4D38F5,stroke:#4D38F5,color:#fff
    style DevServer fill:#5F4CFE,stroke:#5F4CFE,color:#fff
    style LB fill:#22C55E,stroke:#22C55E,color:#000
    style GPU fill:#22C55E,stroke:#22C55E,color:#000
    style CPU fill:#22C55E,stroke:#22C55E,color:#000
```

**Key points:**

* A local development server provides a convenient testing interface at `localhost:8888`.
* `@Endpoint` functions deploy to Runpod Serverless with `live-` prefix to distinguish from production.
* Code changes are picked up automatically without restarting the server.
* The development server routes requests to appropriate remote endpoints.

This differs from `flash deploy`, where all endpoints run on Runpod without a local server.

## Auto-provisioning

By default, endpoints are provisioned lazily on first `@Endpoint` function call. Use `--auto-provision` to provision all endpoints at server startup:

```bash theme={"theme":{"light":"github-light","dark":"github-dark"}}
flash dev --auto-provision
```

### How it works

1. **Discovery**: Scans your app for `@Endpoint` decorated functions.
2. **Deployment**: Deploys resources concurrently (up to 3 at a time).
3. **Confirmation**: Asks for confirmation if deploying more than 5 endpoints.
4. **Caching**: Stores deployed resources in `.flash/resources.pkl` for reuse.
5. **Updates**: Recognizes existing endpoints and updates if configuration changed.

### Benefits

* **Zero cold start**: All endpoints ready before you test them.
* **Faster development**: No waiting for deployment on first HTTP call.
* **Resource reuse**: Cached endpoints are reused across server restarts.

### When to use

* Local development with multiple endpoints.
* Testing workflows that call multiple remote functions.
* Debugging where you want deployment separated from handler logic.

## Provisioning modes

| Mode               | When endpoints are deployed        |
| ------------------ | ---------------------------------- |
| Default (lazy)     | On first `@Endpoint` function call |
| `--auto-provision` | At server startup                  |

## Testing your API

Once the server is running, test your endpoints:

```bash theme={"theme":{"light":"github-light","dark":"github-dark"}}
# Health check
curl http://localhost:8888/

# Call a queue-based GPU endpoint (gpu_worker.py)
curl -X POST http://localhost:8888/gpu_worker/runsync \
  -H "Content-Type: application/json" \
  -d '{"input": {"input_data": {"message": "Hello from the GPU"}}}'

# Call a load-balanced endpoint (lb_worker.py)
curl -X POST http://localhost:8888/lb_worker/process \
  -H "Content-Type: application/json" \
  -d '{"input_data": {"message": "Hello from Flash"}}'
```

<Note>
  Queue-based endpoints require the `{"input": {...}}` wrapper format to match deployed endpoint behavior. Load-balanced endpoints accept direct JSON payloads.
</Note>

Open [http://localhost:8888/docs](http://localhost:8888/docs) for the interactive API explorer.

## Requirements

* `RUNPOD_API_KEY` must be set in your `.env` file or environment.
* A valid Flash project structure (created by `flash init` or manually).

## flash dev vs flash deploy

| Aspect                       | `flash dev`                                          | `flash deploy`    |
| ---------------------------- | ---------------------------------------------------- | ----------------- |
| Local development server     | Yes ([http://localhost:8888](http://localhost:8888)) | No                |
| `@Endpoint` functions run on | Runpod Serverless                                    | Runpod Serverless |
| Endpoint persistence         | Temporary (`live-` prefix)                           | Persistent        |
| Code updates                 | Automatic reload                                     | Manual redeploy   |
| Use case                     | Development                                          | Production        |

## Related commands

* [`flash init`](/flash/cli/init) - Create a new project
* [`flash deploy`](/flash/cli/deploy) - Deploy to production
* [`flash undeploy`](/flash/cli/undeploy) - Remove endpoints