Skip to main content

Deployment issues

Worker fails to start

If your worker fails to start or initialize:
  1. Check logs: View endpoint logs in the Runpod console for error messages.
  2. Verify local testing: Ensure your handler works in local testing before deploying.
  3. Check dependencies: Verify all dependencies are installed in your Docker image.
  4. GPU compatibility: Ensure your Docker image is compatible with the selected GPU type.
  5. Input format: Verify your input format matches what your handler expects.

Worker initializes but fails on requests

IssueSolution
Input validation errorsAdd input validation in your handler and check logs for the expected format
Missing dependenciesVerify all required packages are in your Dockerfile
Model loading failuresCheck GPU memory requirements and model path
Permission errorsEnsure files are readable and directories are writable

Job issues

Jobs stuck in queue

If jobs remain IN_QUEUE for extended periods:
  • No workers available: Check if max_workers is set appropriately.
  • Workers throttled: Your endpoint may be hitting rate limits. Check the Workers tab for throttled workers.
  • Cold start delays: First requests after idle periods require worker initialization. Consider increasing min_workers or enabling FlashBoot.

Jobs timing out

CauseSolution
Processing takes too longIncrease executionTimeout in your job policy
Model loading too slowUse model caching or bake models into your image
TTL too shortSet ttl to cover both queue time and execution time

Jobs failing

Check the job status response for error details. Common causes:
  • Handler exceptions: Unhandled exceptions in your handler code. Add try/catch blocks and return structured errors.
  • OOM (Out of Memory): Model or batch size exceeds GPU memory. Reduce batch size or use a larger GPU.
  • Timeout: Job exceeded execution timeout. Increase timeout or optimize processing.

Cold start issues

Slow cold starts

Cold start time includes container startup, model loading, and initialization. To reduce cold starts:
  1. Use model caching: Store models on network volumes instead of downloading on each start.
  2. Enable FlashBoot: Use FlashBoot for faster container initialization.
  3. Optimize image size: Use smaller base images and remove unnecessary dependencies.
  4. Initialize outside handler: Load models at module level, not inside the handler function.
# Good: Load model once at startup
model = load_model()

def handler(job):
    return model.predict(job["input"])

# Bad: Load model on every request
def handler(job):
    model = load_model()  # Slow!
    return model.predict(job["input"])

Too many cold starts

If you’re seeing frequent cold starts:
  • Increase idle timeout: Set a longer idle_timeout to keep workers warm between requests.
  • Set minimum workers: Configure min_workers > 0 to maintain warm workers.
  • Check traffic patterns: Sporadic traffic causes more cold starts than steady traffic.

Logging issues

Missing logs

If logs aren’t appearing in the console:
  1. Check throttling: Excessive logging triggers throttling. Reduce log verbosity.
  2. Verify output streams: Ensure you’re writing to stdout/stderr, not just files.
  3. Check worker status: Logs only appear for successfully initialized workers.
  4. Retention period: Logs older than 90 days are automatically removed.

Log throttling

To avoid log throttling:
  • Reduce log verbosity in production.
  • Use structured logging for efficiency.
  • Store detailed logs on network volumes instead of console output.

vLLM-specific issues

OOM errors

If your vLLM worker runs out of memory:
  • Lower GPU_MEMORY_UTILIZATION from 0.90 to 0.85.
  • Reduce MAX_MODEL_LEN to limit context window.
  • Use a GPU with more VRAM.

Model not loading

IssueSolution
Model not foundVerify MODEL_NAME matches the Hugging Face model ID exactly
Gated model access deniedSet HF_TOKEN with a token that has access to the model
Incompatible modelCheck vLLM supported models

OpenAI API errors

ErrorCauseSolution
401 UnauthorizedInvalid API keyVerify RUNPOD_API_KEY is correct
404 Not FoundWrong endpoint URLUse the format https://api.runpod.ai/v2/ENDPOINT_ID/openai/v1
Connection refusedEndpoint not readyWait for workers to initialize

Load balancing endpoint issues

”No workers available” error

This means workers didn’t initialize in time. Common causes:
  • First request: Workers need time to start. Retry the request. (See Handling cold starts for more information.)
  • All workers busy: Increase max_workers to handle more concurrent requests.
  • Workers crashing: Check logs for initialization errors.

Requests not reaching workers

Verify your HTTP server is:
  • Listening on port 8000 (or the port specified in your configuration).
  • Binding to 0.0.0.0, not 127.0.0.1.
  • Returning proper HTTP responses.

Getting help

If you’re still experiencing issues:
  1. Check endpoint logs for detailed error messages.
  2. SSH into workers using SSH access to debug in real-time.
  3. Review metrics in the Metrics tab to identify patterns.
  4. Contact support at help@runpod.io with your endpoint ID and error details.