Skip to main content
Workers are containerized environments that run your code on Runpod Serverless.

Deployment workflow

After creating your handler function, package it into a Docker image and deploy it to an endpoint:
1

Create a Dockerfile

Package your handler function and all its dependencies into a Docker image.
2

Deploy to an endpoint

Push your image and create an endpoint using one of two methods:

Model deployment

To deploy workers with AI/ML models, follow this order of preference:
  1. Use cached models: For models on Hugging Face (public or gated), this is the recommended approach. Cached models provide the fastest cold starts and persist across worker restarts.
  2. Bake the model into your Docker image: For private models not on Hugging Face, embed them directly in your container image. This ensures the model is always available but increases image size.
  3. Use network volumes: For development workflows or very large models (500GB+), store models on a network volume. This is slower than cached or baked models but offers flexibility for iteration.

Worker types

Workers can run in two modes depending on your latency and cost requirements:
  • Active workers run continuously (24/7) and are always ready to process requests instantly. They eliminate cold starts entirely and receive a discounted rate, making them ideal for latency-sensitive or high-traffic applications.
  • Flex workers scale dynamically based on demand, spinning down to zero when idle. They incur cold starts when scaling up but cost nothing when not in use, making them ideal for variable or sporadic workloads.
The system may also spin up extra workers during traffic spikes when Docker images are cached on hosts (default: 2).

Worker states

StateDescriptionBilling
InitializingDownloading image, loading codeYes
IdleReady, waiting for requestsNo
RunningProcessing requestsYes
ThrottledReady but host constrainedNo
OutdatedMarked for replacement after updateYes (while processing)
UnhealthyCrashed; auto-retries for up to 7 daysNo
View worker states in the Workers tab of your endpoint in the Runpod console.

Max worker limits

Account balance determines your maximum workers (flex + active combined):
BalanceMax workers
Default5
$100+10
$200+20
$300+30
$500+40
$700+50
$900+60
Need more capacity? Contact support.

Best practices

PracticeBenefit
Optimize image sizeFaster downloads, reduced cold starts
Use model cachingFastest cold starts
Test locally firstCatch issues before deployment
Use logs and SSHDebug and optimize effectively