Deploy custom direct-access REST APIs with load balancing Serverless endpoints.
Load balancing endpoints route incoming traffic directly to available workers, bypassing the queueing system. Unlike that process requests sequentially, load balancing distributes requests across your worker pool for lower latency.You can create custom REST endpoints accessible via a unique URL:
With queue-based endpoints, are placed in a queue and processed in order. They use the standard handler pattern (def handler(job)) and are accessed through fixed endpoints like /run and /runsync.These endpoints are better for tasks that can be processed asynchronously and guarantee request processing, similar to how TCP guarantees packet delivery in networking.
Load balancing endpoints send requests directly to workers without queuing. You can use any HTTP framework such as FastAPI or Flask, and define custom URL paths and API contracts to suit your specific needs.These endpoints are ideal for real-time applications and streaming, but provide no queuing mechanism for request backlog, similar to UDP’s behavior in networking.
Workers must expose a /ping endpoint on the PORT_HEALTH port. The load balancer periodically checks this endpoint:
Response code
Status
200
Healthy
204
Initializing
Other
Unhealthy
Unhealthy workers are automatically removed from the routing pool.
When calculating endpoint metrics, Runpod calculates the cold start time for load balancing workers by measuring the time it takes between /ping first returning 204 until it first returns 200.