> ## Documentation Index > Fetch the complete documentation index at: https://docs.runpod.io/llms.txt > Use this file to discover all available pages before exploring further. # Manage Pods with dstack on Runpod > Use dstack to automate Pod orchestration for AI and ML workloads on Runpod. [dstack](https://dstack.ai/) is an open-source tool that automates Pod orchestration for AI and ML workloads. It lets you define your application and resource requirements in YAML files, then handles provisioning and managing cloud resources on Runpod so you can focus on your application instead of infrastructure. This guide shows you how to set up dstack with Runpod and deploy [vLLM](https://github.com/vllm-project/vllm) to serve the `meta-llama/Llama-3.1-8B-Instruct` model from Hugging Face. ## Requirements You'll need: * [A Runpod account with an API key](/get-started/api-keys). * Python 3.8 or higher installed on your local machine. * `pip` (or `pip3` on macOS). * Basic utilities like `curl`. These instructions work on macOS, Linux, and Windows. **Windows users** Use [WSL (Windows Subsystem for Linux)](https://docs.microsoft.com/en-us/windows/wsl/install) or [Git Bash](https://gitforwindows.org/) to follow along with the Unix-like commands in this guide. Alternatively, use PowerShell or Command Prompt and adjust commands as needed. ## Set up dstack ### Install and configure the server Open a terminal and create a new directory: ```bash theme={"theme":{"light":"github-light","dark":"github-dark"}} mkdir runpod-dstack-tutorial cd runpod-dstack-tutorial ``` ```bash theme={"theme":{"light":"github-light","dark":"github-dark"}} python3 -m venv .venv source .venv/bin/activate ``` ```bash theme={"theme":{"light":"github-light","dark":"github-dark"}} python3 -m venv .venv source .venv/bin/activate ``` **Command Prompt:** ```bash theme={"theme":{"light":"github-light","dark":"github-dark"}} python -m venv .venv .venv\Scripts\activate ``` **PowerShell:** ```bash theme={"theme":{"light":"github-light","dark":"github-dark"}} python -m venv .venv .venv\Scripts\Activate.ps1 ``` Install dstack using `pip`: ```bash theme={"theme":{"light":"github-light","dark":"github-dark"}} pip3 install -U "dstack[all]" ``` ```bash theme={"theme":{"light":"github-light","dark":"github-dark"}} pip install -U "dstack[all]" ``` ```bash theme={"theme":{"light":"github-light","dark":"github-dark"}} pip install -U "dstack[all]" ``` ### Configure dstack for Runpod Create a `config.yml` file in the dstack configuration directory. This file stores your Runpod credentials for all dstack deployments. * **Create the configuration directory:** ```bash theme={"theme":{"light":"github-light","dark":"github-dark"}} mkdir -p ~/.dstack/server ``` ```bash theme={"theme":{"light":"github-light","dark":"github-dark"}} mkdir -p ~/.dstack/server ``` ```bash theme={"theme":{"light":"github-light","dark":"github-dark"}} mkdir %USERPROFILE%\.dstack\server ``` * **Navigate to the configuration directory:** ```bash theme={"theme":{"light":"github-light","dark":"github-dark"}} cd ~/.dstack/server ``` ```bash theme={"theme":{"light":"github-light","dark":"github-dark"}} cd ~/.dstack/server ``` ```bash theme={"theme":{"light":"github-light","dark":"github-dark"}} cd %USERPROFILE%\.dstack\server ``` Create a file named `config.yml` with the following content: ```yml theme={"theme":{"light":"github-light","dark":"github-dark"}} projects: - name: main backends: - type: runpod creds: type: api_key api_key: YOUR_RUNPOD_API_KEY ``` Replace `YOUR_RUNPOD_API_KEY` with your actual Runpod API key. Start the dstack server: ```bash theme={"theme":{"light":"github-light","dark":"github-dark"}} dstack server ``` You'll see output like this: ```bash theme={"theme":{"light":"github-light","dark":"github-dark"}} [INFO] Applying ~/.dstack/server/config.yml... [INFO] The admin token is ADMIN-TOKEN [INFO] The dstack server is running at http://127.0.0.1:3000 ``` Save the `ADMIN-TOKEN` to access the dstack web UI. Open your browser and go to `http://127.0.0.1:3000`. Enter the `ADMIN-TOKEN` from the server output to access the web UI where you can monitor and manage deployments.

## Deploy vLLM ### Configure the deployment Open a new terminal and navigate to your tutorial directory: ```bash theme={"theme":{"light":"github-light","dark":"github-dark"}} cd runpod-dstack-tutorial ``` Activate the Python virtual environment: ```bash theme={"theme":{"light":"github-light","dark":"github-dark"}} source .venv/bin/activate ``` ```bash theme={"theme":{"light":"github-light","dark":"github-dark"}} source .venv/bin/activate ``` **Command Prompt:** ```bash theme={"theme":{"light":"github-light","dark":"github-dark"}} .venv\Scripts\activate ``` **PowerShell:** ```bash theme={"theme":{"light":"github-light","dark":"github-dark"}} .venv\Scripts\Activate.ps1 ``` Create a new directory for the deployment: ```bash theme={"theme":{"light":"github-light","dark":"github-dark"}} mkdir task-vllm-llama cd task-vllm-llama ``` Create a file named `.dstack.yml` with the following content: ```yml theme={"theme":{"light":"github-light","dark":"github-dark"}} type: task name: vllm-llama-3.1-8b-instruct python: "3.10" env: - HUGGING_FACE_HUB_TOKEN=YOUR_HUGGING_FACE_HUB_TOKEN - MODEL_NAME=meta-llama/Llama-3.1-8B-Instruct - MAX_MODEL_LEN=8192 commands: - pip install vllm - vllm serve $MODEL_NAME --port 8000 --max-model-len $MAX_MODEL_LEN ports: - 8000 spot_policy: on-demand resources: gpu: name: "RTX4090" memory: "24GB" cpu: 16.. ``` Replace `YOUR_HUGGING_FACE_HUB_TOKEN` with your [Hugging Face access token](https://huggingface.co/settings/tokens). The model is gated and requires authentication to download. ### Initialize and deploy In the directory with your `.dstack.yml` file, run: ```bash theme={"theme":{"light":"github-light","dark":"github-dark"}} dstack init ``` Deploy the task: ```bash theme={"theme":{"light":"github-light","dark":"github-dark"}} dstack apply ``` You'll see the deployment configuration and available instances. When prompted: ```bash theme={"theme":{"light":"github-light","dark":"github-dark"}} Submit the run vllm-llama-3.1-8b-instruct? [y/n]: ``` Type `y` and press Enter. The `ports` configuration forwards the deployed Pod's port to `localhost:8000` on your machine. dstack will provision the Pod, download the Docker image, install packages, download the model, and start the vLLM server. You'll see progress logs in the terminal. To view logs at any time, run: ```bash theme={"theme":{"light":"github-light","dark":"github-dark"}} dstack logs vllm-llama-3.1-8b-instruct ``` Wait until you see logs indicating the server is ready: ``` INFO: Started server process [1] INFO: Waiting for application startup. INFO: Application startup complete. INFO: Uvicorn running on http://0.0.0.0:8000 (Press CTRL+C to quit) ``` ### Test the deployment The vLLM server is now accessible at `http://localhost:8000`. Test it with `curl`: ```bash theme={"theme":{"light":"github-light","dark":"github-dark"}} curl -X POST http://localhost:8000/v1/chat/completions \ -H "Content-Type: application/json" \ -d '{ "model": "meta-llama/Llama-3.1-8B-Instruct", "messages": [ {"role": "system", "content": "You are Poddy, a helpful assistant."}, {"role": "user", "content": "What is your name?"} ], "temperature": 0, "max_tokens": 150 }' ``` ```bash theme={"theme":{"light":"github-light","dark":"github-dark"}} curl -X POST http://localhost:8000/v1/chat/completions \ -H "Content-Type: application/json" \ -d '{ "model": "meta-llama/Llama-3.1-8B-Instruct", "messages": [ {"role": "system", "content": "You are Poddy, a helpful assistant."}, {"role": "user", "content": "What is your name?"} ], "temperature": 0, "max_tokens": 150 }' ``` **Command Prompt:** ```bash theme={"theme":{"light":"github-light","dark":"github-dark"}} curl -X POST http://localhost:8000/v1/chat/completions -H "Content-Type: application/json" -d "{ \"model\": \"meta-llama/Llama-3.1-8B-Instruct\", \"messages\": [ {\"role\": \"system\", \"content\": \"You are Poddy, a helpful assistant.\"}, {\"role\": \"user\", \"content\": \"What is your name?\"} ], \"temperature\": 0, \"max_tokens\": 150 }" ``` **PowerShell:** ```bash theme={"theme":{"light":"github-light","dark":"github-dark"}} curl.exe -Method Post http://localhost:8000/v1/chat/completions ` -Headers @{ "Content-Type" = "application/json" } ` -Body '{ "model": "meta-llama/Llama-3.1-8B-Instruct", "messages": [ {"role": "system", "content": "You are Poddy, a helpful assistant."}, {"role": "user", "content": "What is your name?"} ], "temperature": 0, "max_tokens": 150 }' ``` You'll receive a JSON response: ```json theme={"theme":{"light":"github-light","dark":"github-dark"}} { "id": "chat-f0566a5143244d34a0c64c968f03f80c", "object": "chat.completion", "created": 1727902323, "model": "meta-llama/Llama-3.1-8B-Instruct", "choices": [ { "index": 0, "message": { "role": "assistant", "content": "My name is Poddy, and I'm here to assist you with any questions or information you may need.", "tool_calls": [] }, "logprobs": null, "finish_reason": "stop", "stop_reason": null } ], "usage": { "prompt_tokens": 49, "total_tokens": 199, "completion_tokens": 150 }, "prompt_logprobs": null } ``` ### Clean up Stop the task when you're done to avoid charges. Press `Ctrl + C` in the terminal where you ran `dstack apply`. When prompted: ``` Stop the run vllm-llama-3.1-8b-instruct before detaching? [y/n]: ``` Type `y` and press Enter. The instance will terminate automatically. To ensure immediate termination, run: ```bash theme={"theme":{"light":"github-light","dark":"github-dark"}} dstack stop vllm-llama-3.1-8b-instruct ``` Verify termination in your Runpod dashboard or the dstack web UI. ## Use volumes for persistent storage Volumes let you store data between runs and cache models to reduce startup times. ### Create a volume Create a file named `volume.dstack.yml`: ```yml theme={"theme":{"light":"github-light","dark":"github-dark"}} type: volume name: llama31-volume backend: runpod region: EUR-IS-1 # Required size size: 100GB ``` The `region` ties your volume to a specific region, which also ties your Pod to that region. Apply the volume configuration: ```sh theme={"theme":{"light":"github-light","dark":"github-dark"}} dstack apply -f volume.dstack.yml ``` ### Use the volume in your task Modify your `.dstack.yml` file to include the volume: ```yml theme={"theme":{"light":"github-light","dark":"github-dark"}} volumes: - name: llama31-volume path: /data ``` This mounts the volume to the `/data` directory inside your container, letting you store models and data persistently. This is useful for large models that take time to download. For more information, see the [dstack blog on volumes](https://dstack.ai/blog/volumes-on-runpod/).