> ## Documentation Index
> Fetch the complete documentation index at: https://docs.runpod.io/llms.txt
> Use this file to discover all available pages before exploring further.

# Troubleshooting

> Common issues and solutions for Serverless endpoints and workers.

## Deployment issues

### Worker fails to start

If your worker fails to start or initialize:

1. **Check logs**: View endpoint logs in the [Runpod console](https://www.console.runpod.io/serverless) for error messages.
2. **Verify local testing**: Ensure your handler works in [local testing](/serverless/development/local-testing) before deploying.
3. **Check dependencies**: Verify all dependencies are installed in your [Docker image](/serverless/workers/create-dockerfile).
4. **GPU compatibility**: Ensure your Docker image is compatible with the selected GPU type.
5. **Input format**: Verify your [input format](/serverless/endpoints/send-requests) matches what your handler expects.

### Worker initializes but fails on requests

| Issue                   | Solution                                                                    |
| ----------------------- | --------------------------------------------------------------------------- |
| Input validation errors | Add input validation in your handler and check logs for the expected format |
| Missing dependencies    | Verify all required packages are in your Dockerfile                         |
| Model loading failures  | Check GPU memory requirements and model path                                |
| Permission errors       | Ensure files are readable and directories are writable                      |

## Job issues

### Jobs stuck in queue

If jobs remain `IN_QUEUE` for extended periods:

* **No workers available**: Check if `max_workers` is set appropriately.
* **Workers throttled**: Your endpoint may be hitting rate limits. Check the Workers tab for throttled workers.
* **Cold start delays**: First requests after idle periods require worker initialization. Consider increasing `min_workers` or enabling [FlashBoot](/serverless/endpoints/endpoint-configurations#flashboot).

### Jobs timing out

| Cause                     | Solution                                                                                                 |
| ------------------------- | -------------------------------------------------------------------------------------------------------- |
| Processing takes too long | Increase `executionTimeout` in your [job policy](/serverless/endpoints/send-requests#execution-policies) |
| Model loading too slow    | Use [model caching](/serverless/endpoints/model-caching) or bake models into your image                  |
| TTL too short             | Set `ttl` to cover both queue time and execution time                                                    |

### Jobs failing

Check the job status response for error details. Common causes:

* **Handler exceptions**: Unhandled exceptions in your handler code. Add try/catch blocks and return structured errors.
* **OOM (Out of Memory)**: Model or batch size exceeds GPU memory. Reduce batch size or use a larger GPU.
* **Timeout**: Job exceeded execution timeout. Increase timeout or optimize processing.

## Cold start issues

### Slow cold starts

Cold start time includes container startup, model loading, and initialization. To reduce cold starts:

1. **Use model caching**: Store models on [network volumes](/serverless/endpoints/model-caching) instead of downloading on each start.
2. **Enable FlashBoot**: Use [FlashBoot](/serverless/endpoints/endpoint-configurations#flashboot) for faster container initialization.
3. **Optimize image size**: Use smaller base images and remove unnecessary dependencies.
4. **Initialize outside handler**: Load models at module level, not inside the handler function.

```python theme={"theme":{"light":"github-light","dark":"github-dark"}}
# Good: Load model once at startup
model = load_model()

def handler(job):
    return model.predict(job["input"])

# Bad: Load model on every request
def handler(job):
    model = load_model()  # Slow!
    return model.predict(job["input"])
```

### Too many cold starts

If you're seeing frequent cold starts:

* **Increase idle timeout**: Set a longer `idle_timeout` to keep workers warm between requests.
* **Set minimum workers**: Configure `min_workers` > 0 to maintain warm workers.
* **Check traffic patterns**: Sporadic traffic causes more cold starts than steady traffic.

## Logging issues

### Missing logs

If logs aren't appearing in the console:

1. **Check throttling**: Excessive logging triggers throttling. Reduce log verbosity.
2. **Verify output streams**: Ensure you're writing to stdout/stderr, not just files.
3. **Check worker status**: Logs only appear for successfully initialized workers.
4. **Retention period**: Logs older than 90 days are automatically removed.

### Log throttling

To avoid log throttling:

* Reduce log verbosity in production.
* Use structured logging for efficiency.
* Store detailed logs on [network volumes](/serverless/storage/overview) instead of console output.

## vLLM-specific issues

### OOM errors

If your vLLM worker runs out of memory:

* Lower `GPU_MEMORY_UTILIZATION` from 0.90 to 0.85.
* Reduce `MAX_MODEL_LEN` to limit context window.
* Use a GPU with more VRAM.

### Model not loading

| Issue                     | Solution                                                                                   |
| ------------------------- | ------------------------------------------------------------------------------------------ |
| Model not found           | Verify `MODEL_NAME` matches the Hugging Face model ID exactly                              |
| Gated model access denied | Set `HF_TOKEN` with a token that has access to the model                                   |
| Incompatible model        | Check [vLLM supported models](https://docs.vllm.ai/en/latest/models/supported_models.html) |

### OpenAI API errors

| Error              | Cause              | Solution                                                        |
| ------------------ | ------------------ | --------------------------------------------------------------- |
| 401 Unauthorized   | Invalid API key    | Verify `RUNPOD_API_KEY` is correct                              |
| 404 Not Found      | Wrong endpoint URL | Use the format `https://api.runpod.ai/v2/ENDPOINT_ID/openai/v1` |
| Connection refused | Endpoint not ready | Wait for workers to initialize                                  |

## Load balancing endpoint issues

### "No workers available" error

This means workers didn't initialize in time. Common causes:

* **First request**: Workers need time to start. Retry the request. (See [Handling cold starts](/serverless/load-balancing/overview#handling-cold-starts) for more information.)
* **All workers busy**: Increase `max_workers` to handle more concurrent requests.
* **Workers crashing**: Check logs for initialization errors.

### Requests not reaching workers

Verify your HTTP server is:

* Listening on port 8000 (or the port specified in your configuration).
* Binding to `0.0.0.0`, not `127.0.0.1`.
* Returning proper HTTP responses.

## Getting help

If you're still experiencing issues:

1. **Check endpoint logs** for detailed error messages.
2. **SSH into workers** using [SSH access](/serverless/development/ssh-into-workers) to debug in real-time.
3. **Review metrics** in the Metrics tab to identify patterns.
4. **Contact support** at [help@runpod.io](mailto:help@runpod.io) with your endpoint ID and error details.
