Deployment issues
Worker fails to start
If your worker fails to start or initialize:- Check logs: View endpoint logs in the Runpod console for error messages.
- Verify local testing: Ensure your handler works in local testing before deploying.
- Check dependencies: Verify all dependencies are installed in your Docker image.
- GPU compatibility: Ensure your Docker image is compatible with the selected GPU type.
- Input format: Verify your input format matches what your handler expects.
Worker initializes but fails on requests
| Issue | Solution |
|---|---|
| Input validation errors | Add input validation in your handler and check logs for the expected format |
| Missing dependencies | Verify all required packages are in your Dockerfile |
| Model loading failures | Check GPU memory requirements and model path |
| Permission errors | Ensure files are readable and directories are writable |
Job issues
Jobs stuck in queue
If jobs remainIN_QUEUE for extended periods:
- No workers available: Check if
max_workersis set appropriately. - Workers throttled: Your endpoint may be hitting rate limits. Check the Workers tab for throttled workers.
- Cold start delays: First requests after idle periods require worker initialization. Consider increasing
min_workersor enabling FlashBoot.
Jobs timing out
| Cause | Solution |
|---|---|
| Processing takes too long | Increase executionTimeout in your job policy |
| Model loading too slow | Use model caching or bake models into your image |
| TTL too short | Set ttl to cover both queue time and execution time |
Jobs failing
Check the job status response for error details. Common causes:- Handler exceptions: Unhandled exceptions in your handler code. Add try/catch blocks and return structured errors.
- OOM (Out of Memory): Model or batch size exceeds GPU memory. Reduce batch size or use a larger GPU.
- Timeout: Job exceeded execution timeout. Increase timeout or optimize processing.
Cold start issues
Slow cold starts
Cold start time includes container startup, model loading, and initialization. To reduce cold starts:- Use model caching: Store models on network volumes instead of downloading on each start.
- Enable FlashBoot: Use FlashBoot for faster container initialization.
- Optimize image size: Use smaller base images and remove unnecessary dependencies.
- Initialize outside handler: Load models at module level, not inside the handler function.
Too many cold starts
If you’re seeing frequent cold starts:- Increase idle timeout: Set a longer
idle_timeoutto keep workers warm between requests. - Set minimum workers: Configure
min_workers> 0 to maintain warm workers. - Check traffic patterns: Sporadic traffic causes more cold starts than steady traffic.
Logging issues
Missing logs
If logs aren’t appearing in the console:- Check throttling: Excessive logging triggers throttling. Reduce log verbosity.
- Verify output streams: Ensure you’re writing to stdout/stderr, not just files.
- Check worker status: Logs only appear for successfully initialized workers.
- Retention period: Logs older than 90 days are automatically removed.
Log throttling
To avoid log throttling:- Reduce log verbosity in production.
- Use structured logging for efficiency.
- Store detailed logs on network volumes instead of console output.
vLLM-specific issues
OOM errors
If your vLLM worker runs out of memory:- Lower
GPU_MEMORY_UTILIZATIONfrom 0.90 to 0.85. - Reduce
MAX_MODEL_LENto limit context window. - Use a GPU with more VRAM.
Model not loading
| Issue | Solution |
|---|---|
| Model not found | Verify MODEL_NAME matches the Hugging Face model ID exactly |
| Gated model access denied | Set HF_TOKEN with a token that has access to the model |
| Incompatible model | Check vLLM supported models |
OpenAI API errors
| Error | Cause | Solution |
|---|---|---|
| 401 Unauthorized | Invalid API key | Verify RUNPOD_API_KEY is correct |
| 404 Not Found | Wrong endpoint URL | Use the format https://api.runpod.ai/v2/ENDPOINT_ID/openai/v1 |
| Connection refused | Endpoint not ready | Wait for workers to initialize |
Load balancing endpoint issues
”No workers available” error
This means workers didn’t initialize in time. Common causes:- First request: Workers need time to start. Retry the request. (See Handling cold starts for more information.)
- All workers busy: Increase
max_workersto handle more concurrent requests. - Workers crashing: Check logs for initialization errors.
Requests not reaching workers
Verify your HTTP server is:- Listening on port 8000 (or the port specified in your configuration).
- Binding to
0.0.0.0, not127.0.0.1. - Returning proper HTTP responses.
Getting help
If you’re still experiencing issues:- Check endpoint logs for detailed error messages.
- SSH into workers using SSH access to debug in real-time.
- Review metrics in the Metrics tab to identify patterns.
- Contact support at help@runpod.io with your endpoint ID and error details.