Integrate with Serverless
Runpod Serverless endpoints are REST APIs that accept HTTP requests, execute your code, and return the result via HTTP response. Each endpoint provides a unique URL and abstracts away the complexity of managing individual GPUs/CPUs. To integrate with Serverless:- Create a handler function with the code for your application.
- Create a Dockerfile to package your handler function and all its dependencies.
- Package your worker into a Docker image and push it to a Docker registry.
- Deploy a Serverless endpoint using the Runpod console or REST API.
- Start sending requests to the endpoint.
Integrate with Pods
Pods are self-contained compute environments, providing instant access to powerful GPU and CPU resources. They’re ideal for applications that require a consistent, predictable environment, such as web applications or backend services with a constant workload. There are two primary methods for integrating a Pod with your application:HTTP proxy
For web-based APIs or UIs, Runpod provides an automated HTTP proxy. Any port you expose as an HTTP port in your template or Pod configuration is accessible via a unique URL. The URL follows this format:HTTP proxy URL format
abc123xyz and you exposed port 8000, your application would send requests to:
HTTP proxy URL example
Direct TCP
For protocols that require persistent connections or fall outside of standard HTTP, use the Direct TCP Ports. When you expose a TCP port, Runpod assigns a public IP address and a mapped external port. You can find these details using theGET /pods/POD_ID endpoint or the Pod connection menu in the Runpod console.
Integrate with Public Endpoints
Public Endpoints are pre-deployed AI models that you can use without setting up your own Serverless endpoint. They are extremely simple to integrate, requiring zero infrastructure configuration, and you can start using them immediately by pointing your application to the Public Endpoint URL. The easiest way to get started is to use the Public Endpoint playground to configure your request parameters, then click theAPI tab to copy the code to your application.
Integrate external tools with OpenAI-compatible endpoints
Many external tools and agentic frameworks support OpenAI-compatible endpoints with little-to-no configuration required. Integration is usually straightforward: any library or framework that accepts a custom base URL for API calls will work with Runpod without specialized adapters or connectors. This means you can integrate Runpod with tools like n8n, CrewAI, LangChain, and many others by simply pointing them to your Runpod endpoint URL and providing your Runpod API key for authentication:Base URL format
Public Endpoints
Public Endpoints are pre-deployed AI models that you can use without setting up your own Serverless endpoint. They’re vLLM-compatible and return OpenAI-compatible responses, so you can get started quickly without deploying The following Public Endpoint URLs are available for OpenAI-compatible models:Public Endpoint base URLs
vLLM endpoints
Serverless vLLM workers are optimized for running large language models and return OpenAI-compatible responses, making them ideal for tools that expect OpenAI’s API format. When you deploy a vLLM worker, you can access it using the OpenAI-compatible API at this base URL:vLLM endpoint base URL
ENDPOINT_ID is your Serverless endpoint ID.
You may need to set a custom chat template or tool call parser to ensure your model returns responses in the format your integration expects. For example, you can configure the Qwen/qwen3-32b-awq model for OpenAI compatibility by adding these environment variables to your vLLM endpoint settings:
Qwen3 32B AWQ vLLM environment variables