> ## Documentation Index
> Fetch the complete documentation index at: https://docs.runpod.io/llms.txt
> Use this file to discover all available pages before exploring further.

# Get started with Flash

> Run your first GPU workload with Flash in less than 5 minutes.

This quickstart gets you running GPU workloads on Runpod in minutes. You'll execute a function on a remote GPU and see the results immediately.

## Requirements

* [Runpod account](/get-started/manage-accounts) with a verified email address.
* [An API key](/get-started/api-keys) with **All** access permissions to your Runpod account.
* [Python 3.10, 3.11, 3.12, or 3.13](https://www.python.org/downloads/) installed.
* [uv](https://docs.astral.sh/uv/) installed.

## Step 1: Install Flash

<Note>
  Flash runs natively on macOS and Linux. Windows users can run Flash through [WSL2](/flash/windows-wsl2).
</Note>

Create a virtual environment and install Flash using [uv](https://docs.astral.sh/uv/):

```bash theme={"theme":{"light":"github-light","dark":"github-dark"}}
uv venv
source .venv/bin/activate
uv pip install runpod-flash
```

### Optional: Install coding agent integration

If you're using an AI coding agent like Claude Code, Cline, or Cursor, you can install the Flash skill package to give your agent detailed context about the Flash SDK:

```bash theme={"theme":{"light":"github-light","dark":"github-dark"}}
npx skills add runpod/skills
```

This enables your coding agent to provide more accurate Flash code suggestions and troubleshooting help.

## Step 2: Authenticate with Runpod

Log in to your Runpod account:

```bash theme={"theme":{"light":"github-light","dark":"github-dark"}}
flash login

# If using uv:
uv run flash login
```

This opens your browser to authorize Flash. After you approve, your credentials are saved, allowing you to run Flash commands and scripts.

<Tip>
  Alternatively, you can set the `RUNPOD_API_KEY` environment variable or add it to a `.env` file. See [`flash login`](/flash/cli/login) for details.
</Tip>

## Step 3: Copy this code

Create a file called `gpu_demo.py` and paste this code into it:

```python theme={"theme":{"light":"github-light","dark":"github-dark"}}
import asyncio
from runpod_flash import Endpoint, GpuGroup

@Endpoint(
    name="flash-quickstart",
    gpu=GpuGroup.ANY, # Use any available GPU
    workers=3,
    idle_timeout=300, # Keep worker running for 5 minutes
    dependencies=["numpy", "torch"]
)
def gpu_matrix_multiply(size):
    # IMPORTANT: Import packages INSIDE the function
    import numpy as np
    import torch

    # Get GPU name
    device_name = torch.cuda.get_device_name(0)

    # Create random matrices
    A = np.random.rand(size, size)
    B = np.random.rand(size, size)

    # Multiply matrices
    C = np.dot(A, B)

    return {
        "matrix_size": size,
        "result_mean": float(np.mean(C)),
        "gpu": device_name
    }

# Call the function
async def main():
    print("Running matrix multiplication on Runpod GPU...")
    result = await gpu_matrix_multiply(1000)

    print(f"\n✓ Matrix size: {result['matrix_size']}x{result['matrix_size']}")
    print(f"✓ Result mean: {result['result_mean']:.4f}")
    print(f"✓ GPU used: {result['gpu']}")

if __name__ == "__main__":
    asyncio.run(main())
```

<Warning>
  Make sure you activate your virtual environment in the same directory where you created the `gpu_demo.py` file. If you open a new terminal, run `source .venv/bin/activate` before executing the script.
</Warning>

## Step 4: Run it

Execute the script:

```bash theme={"theme":{"light":"github-light","dark":"github-dark"}}
python gpu_demo.py
```

You'll see Flash provision a GPU worker and execute your function:

```text theme={"theme":{"light":"github-light","dark":"github-dark"}}
Running matrix multiplication on Runpod GPU...
Creating endpoint: flash-quickstart
Provisioning Serverless endpoint...
Endpoint ready
Executing function on RunPod endpoint ID: xvf32dan8rcilp
Initial job status: IN_QUEUE
Job completed, output received

✓ Matrix size: 1000x1000
✓ Result mean: 249.8286
✓ GPU used: NVIDIA RTX A5000
```

The first run takes 30-60 seconds, while Runpod provisions the endpoint, installs dependencies, and starts a worker. Subsequent runs take 2-3 seconds (because the worker is already running).

<Tip>
  If you're having authorization issues, you can set your API key directly in your terminal:

  ```bash theme={"theme":{"light":"github-light","dark":"github-dark"}}
  export RUNPOD_API_KEY="your_key"
  ```

  Replace `your_key` with your actual API key from the [Runpod console](https://www.runpod.io/console/user/settings).
</Tip>

## Step 5: Update and run again

With your endpoint running, make a change and run the script again:

1. Open `gpu_demo.py` and change the matrix size from `1000` to `2000`:

   ```python theme={"theme":{"light":"github-light","dark":"github-dark"}}
   result = await gpu_matrix_multiply(2000)
   ```

2. Run the script again:

   ```bash theme={"theme":{"light":"github-light","dark":"github-dark"}}
   python gpu_demo.py
   ```

This time, the result should appear in 1-3 seconds instead of 30-60 seconds, injects the code into the running worker so code changes take effect immediately without reprovisioning.

This instant iteration is one of Flash's key features. You can develop and test GPU code as quickly as local development, even though it runs on remote hardware.

## Step 6: Understand what you just did

Let's break down the code you just ran:

### Imports and setup

```python theme={"theme":{"light":"github-light","dark":"github-dark"}}
import asyncio
from runpod_flash import Endpoint, GpuGroup
```

* **`asyncio`**: Enables asynchronous execution (endpoint functions run async).
* **`Endpoint`**: The class that marks functions for remote execution.
* **`GpuGroup`**: Enum for selecting GPU types or groups of GPUs.

Flash automatically loads your credentials from `flash login` or the `RUNPOD_API_KEY` environment variable.

### The `@Endpoint` decorator

```python theme={"theme":{"light":"github-light","dark":"github-dark"}}
@Endpoint(
    name="flash-quickstart",
    gpu=GpuGroup.ANY,
    workers=3,
    idle_timeout=300,
    dependencies=["numpy", "torch"]
)
def gpu_matrix_multiply(size):
    import numpy as np
    import torch

    # Get GPU name
    device_name = torch.cuda.get_device_name(0)

    # Create random matrices
    A = np.random.rand(size, size)
    B = np.random.rand(size, size)

    # Multiply matrices
    C = np.dot(A, B)

    return {
        "matrix_size": size,
        "result_mean": float(np.mean(C)),
        "gpu": device_name
    }
```

The `@Endpoint` decorator configures everything in one place:

* **`name`**: Identifies your endpoint in the [Runpod console](https://www.runpod.io/console/serverless).
* **`gpu`**: Which GPU to use (`GpuGroup.ANY` accepts any available GPU for faster provisioning).
* **`workers`**: Maximum parallel workers (allows 3 concurrent executions).
* **`idle_timeout`**: Seconds a worker stays active after completing a request before scaling down. Setting this to 300 (5 minutes) gives you more time to iterate on your code while the worker remains warm.
* **`dependencies`**: Python packages to install on the worker.
* **Function body**: The matrix multiplication code runs on the remote GPU, not your local machine.
* **Return value**: The result is returned to your local machine as a Python dictionary.

See [GPU types](/flash/configuration/gpu-types) for available GPUs or [endpoint functions](/flash/create-endpoints) for all configuration options.

<Warning>
  You must import packages **inside the function body**, not at the top of your file. These imports need to happen on the remote worker.
</Warning>

### Calling the function

```python theme={"theme":{"light":"github-light","dark":"github-dark"}}
async def main():
    print("Running matrix multiplication on Runpod GPU...")
    result = await gpu_matrix_multiply(1000)

    print(f"\n✓ Matrix size: {result['matrix_size']}x{result['matrix_size']}")
    print(f"✓ Result mean: {result['result_mean']:.4f}")
    print(f"✓ GPU used: {result['gpu']}")

if __name__ == "__main__":
    asyncio.run(main())
```

Here's what happens when you call an `@Endpoint` decorated function:

1. Flash checks if the endpoint specified in your decorator already exists.
   * If yes: It updates the endpoint if the configuration has changed.
   * If no: It creates a new endpoint, initializes a worker, and installs your dependencies.
2. Flash sends your code to the GPU worker
3. The GPU worker executes the function with the provided inputs.
4. The result is returned to your local machine as a Python dictionary, where it's printed in your terminal.

Everything outside the `@Endpoint` function (all the `print` statements, etc.) runs **locally on your machine**. Only the decorated function runs remotely.

## Step 7: Run multiple operations in parallel

Flash makes it easy to run multiple GPU operations concurrently. Replace your `main()` function with the code below:

```python theme={"theme":{"light":"github-light","dark":"github-dark"}}
async def main():
    print("Running 3 matrix operations in parallel...")

    # Run all three operations at once
    results = await asyncio.gather(
        gpu_matrix_multiply(500),
        gpu_matrix_multiply(1000),
        gpu_matrix_multiply(2000)
    )

    # Print results
    for i, result in enumerate(results, 1):
        print(f"\n{i}. Size: {result['matrix_size']}x{result['matrix_size']}")
        print(f"   Mean: {result['result_mean']:.4f}")
        print(f"   GPU: {result['gpu']}")
```

Run the script again:

```bash theme={"theme":{"light":"github-light","dark":"github-dark"}}
python gpu_demo.py
```

All three operations execute simultaneously:

```text theme={"theme":{"light":"github-light","dark":"github-dark"}}
Running 3 matrix operations in parallel...
Initial job status: IN_QUEUE
Initial job status: IN_QUEUE
Initial job status: IN_QUEUE
Job completed, output received
Job completed, output received
Job completed, output received

1. Size: 500x500
   Mean: 125.3097
   GPU: NVIDIA RTX A5000

2. Size: 1000x1000
   Mean: 249.9442
   GPU: NVIDIA RTX A5000

3. Size: 2000x2000
   Mean: 500.1321
   GPU: NVIDIA RTX A5000
```

## Clean up

When you're done testing, clean up the endpoints:

```bash theme={"theme":{"light":"github-light","dark":"github-dark"}}
# List all endpoints
flash undeploy list

# Remove the quickstart endpoint
flash undeploy flash-quickstart

# Or remove all endpoints
flash undeploy --all

# If using uv:
uv run flash undeploy list
uv run flash undeploy flash-quickstart
uv run flash undeploy --all
```

## Next steps

You've successfully run GPU code on Runpod! Now you're ready to learn more about Flash:

<CardGroup cols={2}>
  <Card title="Generate images with Flash" href="/tutorials/flash/image-generation-with-sdxl" icon="image" horizontal>
    Use Stable Diffusion XL to generate images from text prompts.
  </Card>

  <Card title="Endpoint functions" href="/flash/create-endpoints" icon="function" horizontal>
    Learn how to configure and optimize endpoint functions.
  </Card>

  <Card title="Build Flash apps" href="/flash/apps/overview" icon="code" horizontal>
    Deploy production APIs.
  </Card>

  <Card title="Explore Flash examples" href="https://github.com/runpod/flash-examples" icon="github" horizontal>
    Browse example Flash scripts and apps on GitHub.
  </Card>
</CardGroup>

## Troubleshooting

### Authentication error

```text theme={"theme":{"light":"github-light","dark":"github-dark"}}
Error: API key is not set
```

**Solution**: Run `flash login` to authenticate with your Runpod account:

```bash theme={"theme":{"light":"github-light","dark":"github-dark"}}
flash login

# If using uv:
uv run flash login
```

Alternatively, set the `RUNPOD_API_KEY` environment variable:

```bash theme={"theme":{"light":"github-light","dark":"github-dark"}}
export RUNPOD_API_KEY="your_key"
```

### Template name conflict

```text theme={"theme":{"light":"github-light","dark":"github-dark"}}
Error: endpoint template names must be unique
```

**Solution**: Each endpoint needs a unique `name`. If you've deployed an endpoint before with the same name, either:

* Use a different name for your new endpoint
* Undeploy the existing endpoint with `flash undeploy <name> --force`

### Job stuck in queue

```text theme={"theme":{"light":"github-light","dark":"github-dark"}}
Initial job status: IN_QUEUE
[Stays in queue for >60 seconds]
```

**Solution**: No GPUs available. Use `GpuGroup.ANY` to accept any available GPU:

```python theme={"theme":{"light":"github-light","dark":"github-dark"}}
@Endpoint(
    name="flash-quickstart",
    gpu=GpuGroup.ANY,
    dependencies=["numpy", "torch"]
)
def gpu_matrix_multiply(size):
    ...
```

Or add multiple specific GPU types for fallback:

```python theme={"theme":{"light":"github-light","dark":"github-dark"}}
@Endpoint(
    name="flash-quickstart",
    gpu=[
        GpuType.NVIDIA_GEFORCE_RTX_4090,
        GpuType.NVIDIA_RTX_A5000,
        GpuType.NVIDIA_RTX_A6000
    ],
    dependencies=["numpy", "torch"]
)
def gpu_matrix_multiply(size):
    ...
```

You can also check [GPU availability](https://www.runpod.io/console/serverless) in the console.

### Import errors

```text theme={"theme":{"light":"github-light","dark":"github-dark"}}
ModuleNotFoundError: No module named 'numpy'
```

**Solution**: Move imports inside the `@Endpoint` function:

```python theme={"theme":{"light":"github-light","dark":"github-dark"}}
@Endpoint(name="compute", gpu=GpuGroup.ANY, dependencies=["numpy"])
def my_function():
    import numpy as np  # Import here, not at top of file
    # ...
```

See the [execution model](/flash/execution-model#common-execution-issues) for more troubleshooting.
