📄️ Overview
Discover RunPod's vLLM workers: optimized Serverless containers for deploying Hugging Face LLMs. vLLM workers offer OpenAI API compatibility, auto-scaling, and cost-effective performance.
📄️ Deploy a vLLM worker
Learn how to deploy a vLLM worker on RunPod Serverless to create a scalable endpoint for your large language model (LLM) applications.
📄️ Send vLLM requests
Learn how to send requests to vLLM workers on RunPod Serverless, including code examples and best practices for RunPod's native API format.
📄️ OpenAI API compability
Learn how RunPod's vLLM workers provide OpenAI API compatibility, enabling you to use standard OpenAI clients and tools with models deployed on RunPod.