> ## Documentation Index
> Fetch the complete documentation index at: https://docs.runpod.io/llms.txt
> Use this file to discover all available pages before exploring further.

# Manage Pods with dstack on Runpod

> Use dstack to automate Pod orchestration for AI and ML workloads on Runpod.

[dstack](https://dstack.ai/) is an open-source tool that automates Pod orchestration for AI and ML workloads. It lets you define your application and resource requirements in YAML files, then handles provisioning and managing cloud resources on Runpod so you can focus on your application instead of infrastructure.

This guide shows you how to set up dstack with Runpod and deploy [vLLM](https://github.com/vllm-project/vllm) to serve the `meta-llama/Llama-3.1-8B-Instruct` model from Hugging Face.

## Requirements

You'll need:

* [A Runpod account with an API key](/get-started/api-keys).
* Python 3.8 or higher installed on your local machine.
* `pip` (or `pip3` on macOS).
* Basic utilities like `curl`.

These instructions work on macOS, Linux, and Windows.

<Info>
  **Windows users**

  Use [WSL (Windows Subsystem for Linux)](https://docs.microsoft.com/en-us/windows/wsl/install) or [Git Bash](https://gitforwindows.org/) to follow along with the Unix-like commands in this guide. Alternatively, use PowerShell or Command Prompt and adjust commands as needed.
</Info>

## Set up dstack

### Install and configure the server

<Steps>
  <Step title="Prepare your workspace">
    Open a terminal and create a new directory:

    ```bash theme={"theme":{"light":"github-light","dark":"github-dark"}}
    mkdir runpod-dstack-tutorial
    cd runpod-dstack-tutorial
    ```
  </Step>

  <Step title="Set up a Python virtual environment">
    <Tabs>
      <Tab title="macOS">
        ```bash theme={"theme":{"light":"github-light","dark":"github-dark"}}
        python3 -m venv .venv
        source .venv/bin/activate
        ```
      </Tab>

      <Tab title="Linux">
        ```bash theme={"theme":{"light":"github-light","dark":"github-dark"}}
        python3 -m venv .venv
        source .venv/bin/activate
        ```
      </Tab>

      <Tab title="Windows">
        **Command Prompt:**

        ```bash theme={"theme":{"light":"github-light","dark":"github-dark"}}
        python -m venv .venv
        .venv\Scripts\activate
        ```

        **PowerShell:**

        ```bash theme={"theme":{"light":"github-light","dark":"github-dark"}}
        python -m venv .venv
        .venv\Scripts\Activate.ps1
        ```
      </Tab>
    </Tabs>
  </Step>

  <Step title="Install dstack">
    Install dstack using `pip`:

    <Tabs>
      <Tab title="macOS">
        ```bash theme={"theme":{"light":"github-light","dark":"github-dark"}}
        pip3 install -U "dstack[all]"
        ```
      </Tab>

      <Tab title="Linux">
        ```bash theme={"theme":{"light":"github-light","dark":"github-dark"}}
        pip install -U "dstack[all]"
        ```
      </Tab>

      <Tab title="Windows">
        ```bash theme={"theme":{"light":"github-light","dark":"github-dark"}}
        pip install -U "dstack[all]"
        ```
      </Tab>
    </Tabs>
  </Step>
</Steps>

### Configure dstack for Runpod

<Steps>
  <Step title="Create the global configuration file">
    Create a `config.yml` file in the dstack configuration directory. This file stores your Runpod credentials for all dstack deployments.

    * **Create the configuration directory:**

          <Tabs>
            <Tab title="macOS">
              ```bash theme={"theme":{"light":"github-light","dark":"github-dark"}}
              mkdir -p ~/.dstack/server
              ```
            </Tab>

            <Tab title="Linux">
              ```bash theme={"theme":{"light":"github-light","dark":"github-dark"}}
              mkdir -p ~/.dstack/server
              ```
            </Tab>

            <Tab title="Windows">
              ```bash theme={"theme":{"light":"github-light","dark":"github-dark"}}
              mkdir %USERPROFILE%\.dstack\server
              ```
            </Tab>
          </Tabs>

    * **Navigate to the configuration directory:**

          <Tabs>
            <Tab title="macOS">
              ```bash theme={"theme":{"light":"github-light","dark":"github-dark"}}
              cd ~/.dstack/server
              ```
            </Tab>

            <Tab title="Linux">
              ```bash theme={"theme":{"light":"github-light","dark":"github-dark"}}
              cd ~/.dstack/server
              ```
            </Tab>

            <Tab title="Windows">
              ```bash theme={"theme":{"light":"github-light","dark":"github-dark"}}
              cd %USERPROFILE%\.dstack\server
              ```
            </Tab>
          </Tabs>

    Create a file named `config.yml` with the following content:

    ```yml theme={"theme":{"light":"github-light","dark":"github-dark"}}
    projects:
      - name: main
        backends:
          - type: runpod
            creds:
              type: api_key
              api_key: YOUR_RUNPOD_API_KEY
    ```

    Replace `YOUR_RUNPOD_API_KEY` with your actual Runpod API key.
  </Step>

  <Step title="Start the dstack server">
    Start the dstack server:

    ```bash theme={"theme":{"light":"github-light","dark":"github-dark"}}
    dstack server
    ```

    You'll see output like this:

    ```bash theme={"theme":{"light":"github-light","dark":"github-dark"}}
    [INFO] Applying ~/.dstack/server/config.yml...
    [INFO] The admin token is ADMIN-TOKEN
    [INFO] The dstack server is running at http://127.0.0.1:3000
    ```

    <Info>
      Save the `ADMIN-TOKEN` to access the dstack web UI.
    </Info>
  </Step>

  <Step title="Access the dstack web UI">
    Open your browser and go to `http://127.0.0.1:3000`. Enter the `ADMIN-TOKEN` from the server output to access the web UI where you can monitor and manage deployments.

    <Frame>
      <img src="https://mintcdn.com/runpod-b18f5ded/45mQOiVf5AVJdF5-/images/e96e09d1-dstack_webui-fab1ac1123c8d4ce9a0458569bd97260.png?fit=max&auto=format&n=45mQOiVf5AVJdF5-&q=85&s=88b0c653e1e736d3196a893c11bfb49a" width="3370" height="1886" data-path="images/e96e09d1-dstack_webui-fab1ac1123c8d4ce9a0458569bd97260.png" />
    </Frame>
  </Step>
</Steps>

## Deploy vLLM

### Configure the deployment

<Steps>
  <Step title="Prepare for deployment">
    Open a new terminal and navigate to your tutorial directory:

    ```bash theme={"theme":{"light":"github-light","dark":"github-dark"}}
    cd runpod-dstack-tutorial
    ```

    Activate the Python virtual environment:

    <Tabs>
      <Tab title="macOS">
        ```bash theme={"theme":{"light":"github-light","dark":"github-dark"}}
        source .venv/bin/activate
        ```
      </Tab>

      <Tab title="Linux">
        ```bash theme={"theme":{"light":"github-light","dark":"github-dark"}}
        source .venv/bin/activate
        ```
      </Tab>

      <Tab title="Windows">
        **Command Prompt:**

        ```bash theme={"theme":{"light":"github-light","dark":"github-dark"}}
        .venv\Scripts\activate
        ```

        **PowerShell:**

        ```bash theme={"theme":{"light":"github-light","dark":"github-dark"}}
        .venv\Scripts\Activate.ps1
        ```
      </Tab>
    </Tabs>
  </Step>

  <Step title="Create a directory for the task">
    Create a new directory for the deployment:

    ```bash theme={"theme":{"light":"github-light","dark":"github-dark"}}
    mkdir task-vllm-llama
    cd task-vllm-llama
    ```
  </Step>

  <Step title="Create the dstack configuration file">
    Create a file named `.dstack.yml` with the following content:

    ```yml theme={"theme":{"light":"github-light","dark":"github-dark"}}
    type: task
    name: vllm-llama-3.1-8b-instruct
    python: "3.10"
    env:
      - HUGGING_FACE_HUB_TOKEN=YOUR_HUGGING_FACE_HUB_TOKEN
      - MODEL_NAME=meta-llama/Llama-3.1-8B-Instruct
      - MAX_MODEL_LEN=8192
    commands:
      - pip install vllm
      - vllm serve $MODEL_NAME --port 8000 --max-model-len $MAX_MODEL_LEN
    ports:
      - 8000
    spot_policy: on-demand
    resources:
      gpu:
        name: "RTX4090"
        memory: "24GB"
      cpu: 16..
    ```

    <Info>
      Replace `YOUR_HUGGING_FACE_HUB_TOKEN` with your [Hugging Face access token](https://huggingface.co/settings/tokens). The model is gated and requires authentication to download.
    </Info>
  </Step>
</Steps>

### Initialize and deploy

<Steps>
  <Step title="Initialize dstack">
    In the directory with your `.dstack.yml` file, run:

    ```bash theme={"theme":{"light":"github-light","dark":"github-dark"}}
    dstack init
    ```
  </Step>

  <Step title="Apply the configuration">
    Deploy the task:

    ```bash theme={"theme":{"light":"github-light","dark":"github-dark"}}
    dstack apply
    ```

    You'll see the deployment configuration and available instances. When prompted:

    ```bash theme={"theme":{"light":"github-light","dark":"github-dark"}}
    Submit the run vllm-llama-3.1-8b-instruct? [y/n]:
    ```

    Type `y` and press Enter.

    The `ports` configuration forwards the deployed Pod's port to `localhost:8000` on your machine.
  </Step>

  <Step title="Monitor the deployment">
    dstack will provision the Pod, download the Docker image, install packages, download the model, and start the vLLM server. You'll see progress logs in the terminal.

    To view logs at any time, run:

    ```bash theme={"theme":{"light":"github-light","dark":"github-dark"}}
    dstack logs vllm-llama-3.1-8b-instruct
    ```

    Wait until you see logs indicating the server is ready:

    ```
    INFO: Started server process [1]
    INFO: Waiting for application startup.
    INFO: Application startup complete.
    INFO: Uvicorn running on http://0.0.0.0:8000 (Press CTRL+C to quit)
    ```
  </Step>
</Steps>

### Test the deployment

The vLLM server is now accessible at `http://localhost:8000`.

Test it with `curl`:

<Tabs>
  <Tab title="macOS">
    ```bash theme={"theme":{"light":"github-light","dark":"github-dark"}}
    curl -X POST http://localhost:8000/v1/chat/completions \
         -H "Content-Type: application/json" \
         -d '{
              "model": "meta-llama/Llama-3.1-8B-Instruct",
              "messages": [
                 {"role": "system", "content": "You are Poddy, a helpful assistant."},
                 {"role": "user", "content": "What is your name?"}
              ],
              "temperature": 0,
              "max_tokens": 150
            }'
    ```
  </Tab>

  <Tab title="Linux">
    ```bash theme={"theme":{"light":"github-light","dark":"github-dark"}}
    curl -X POST http://localhost:8000/v1/chat/completions \
         -H "Content-Type: application/json" \
         -d '{
              "model": "meta-llama/Llama-3.1-8B-Instruct",
              "messages": [
                 {"role": "system", "content": "You are Poddy, a helpful assistant."},
                 {"role": "user", "content": "What is your name?"}
              ],
              "temperature": 0,
              "max_tokens": 150
            }'
    ```
  </Tab>

  <Tab title="Windows">
    **Command Prompt:**

    ```bash theme={"theme":{"light":"github-light","dark":"github-dark"}}
    curl -X POST http://localhost:8000/v1/chat/completions -H "Content-Type: application/json" -d "{ \"model\": \"meta-llama/Llama-3.1-8B-Instruct\", \"messages\": [ {\"role\": \"system\", \"content\": \"You are Poddy, a helpful assistant.\"}, {\"role\": \"user\", \"content\": \"What is your name?\"} ], \"temperature\": 0, \"max_tokens\": 150 }"
    ```

    **PowerShell:**

    ```bash theme={"theme":{"light":"github-light","dark":"github-dark"}}
    curl.exe -Method Post http://localhost:8000/v1/chat/completions `
      -Headers @{ "Content-Type" = "application/json" } `
      -Body '{ "model": "meta-llama/Llama-3.1-8B-Instruct", "messages": [ {"role": "system", "content": "You are Poddy, a helpful assistant."}, {"role": "user", "content": "What is your name?"} ], "temperature": 0, "max_tokens": 150 }'
    ```
  </Tab>
</Tabs>

You'll receive a JSON response:

```json theme={"theme":{"light":"github-light","dark":"github-dark"}}
{
  "id": "chat-f0566a5143244d34a0c64c968f03f80c",
  "object": "chat.completion",
  "created": 1727902323,
  "model": "meta-llama/Llama-3.1-8B-Instruct",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "My name is Poddy, and I'm here to assist you with any questions or information you may need.",
        "tool_calls": []
      },
      "logprobs": null,
      "finish_reason": "stop",
      "stop_reason": null
    }
  ],
  "usage": {
    "prompt_tokens": 49,
    "total_tokens": 199,
    "completion_tokens": 150
  },
  "prompt_logprobs": null
}
```

### Clean up

Stop the task when you're done to avoid charges.

Press `Ctrl + C` in the terminal where you ran `dstack apply`. When prompted:

```
Stop the run vllm-llama-3.1-8b-instruct before detaching? [y/n]:
```

Type `y` and press Enter.

The instance will terminate automatically. To ensure immediate termination, run:

```bash theme={"theme":{"light":"github-light","dark":"github-dark"}}
dstack stop vllm-llama-3.1-8b-instruct
```

Verify termination in your Runpod dashboard or the dstack web UI.

## Use volumes for persistent storage

Volumes let you store data between runs and cache models to reduce startup times.

### Create a volume

Create a file named `volume.dstack.yml`:

```yml theme={"theme":{"light":"github-light","dark":"github-dark"}}
type: volume
name: llama31-volume

backend: runpod
region: EUR-IS-1

# Required size
size: 100GB
```

<Info>
  The `region` ties your volume to a specific region, which also ties your Pod to that region.
</Info>

Apply the volume configuration:

```sh theme={"theme":{"light":"github-light","dark":"github-dark"}}
dstack apply -f volume.dstack.yml
```

### Use the volume in your task

Modify your `.dstack.yml` file to include the volume:

```yml theme={"theme":{"light":"github-light","dark":"github-dark"}}
volumes:
- name: llama31-volume
 path: /data
```

This mounts the volume to the `/data` directory inside your container, letting you store models and data persistently. This is useful for large models that take time to download.

For more information, see the [dstack blog on volumes](https://dstack.ai/blog/volumes-on-runpod/).
