📄️ Overview
Use the runpod/worker-vllm:latest image to deploy a vLLM Worker.
📄️ Get started
RunPod provides a simple way to run large language models (LLMs) as a Serverless Endpoint.
📄️ OpenAI compatibility
The vLLM Worker is compatible with OpenAI's API, so you can use the same code to interact with the vLLM Worker as you would with OpenAI's API.
📄️ Environment variables
Environment variables configure your vLLM Worker by providing control over model selection, access credentials, and operational parameters necessary for optimal Worker performance.
📄️ Configurable Endpoints
Focus on selecting their desired model and customizing the template parameters, while vLLM takes care of the low-level details of model loading, hardware configuration, and execution.