@Endpoint function.
What runs where
The@Endpoint decorator marks functions for remote execution. Everything else runs locally.
| Code | Location |
|---|---|
@Endpoint decorator | Your machine (marks function) |
Inside process_on_gpu | Runpod worker |
| Everything else | Your machine |
Flash apps
When you build a Flash app: Development (flash run):
- FastAPI server runs locally.
@Endpointfunctions run on Runpod workers.
flash deploy):
- Each endpoint configuration becomes a separate Serverless endpoint.
- All endpoints run on Runpod.
Execution flow
Here’s what happens when you call an@Endpoint function:
Endpoint naming
Flash identifies endpoints by theirname parameter:
- Same name, same config: Reuses the existing endpoint.
- Same name, different config: Updates the endpoint automatically.
- New name: Creates a new endpoint.
workers without creating a new endpoint—Flash detects the change and updates it.
Worker lifecycle
Workers scale up and down based on demand and your configuration.Worker states
Initializing: The worker is starting up and downloading dependencies. Idle: The worker is ready but not processing requests. Running: The worker actively processes requests. Throttled: The worker is temporarily unable to run due to host resource constraints. Outdated: The system marks the worker for replacement after endpoint updates. It continues processing current jobs during rolling updates (10% of max workers at a time). Unhealthy: The worker has crashed due to Docker image issues, incorrect start commands, or machine problems. The system automatically retries with exponential backoff for up to 7 days.Scaling behavior
- First job arrives → Scale to 1 worker (cold start).
- More jobs arrive while worker busy → Scale up to max workers.
- Jobs complete → Workers stay idle for
idle_timeout. - No new jobs → Scale down to min workers.
Cold starts and warm starts
Understanding cold and warm starts helps you predict latency and set expectations.Cold start
A cold start occurs when no workers are available to handle your job:- You’re calling an endpoint for the first time.
- All workers scaled down after being idle beyond
idle_timeout. - All active workers are busy and a new one must spin up.
- Runpod provisions a new worker with your configured GPU/CPU.
- The worker image starts (dependencies are pre-installed during build).
- Your function executes.
When using
flash build or flash deploy, dependencies are pre-installed in the worker image, eliminating pip installation at request time. When running standalone scripts with @Endpoint functions outside of a Flash app, dependencies may be installed on the worker at request time.Warm start
A warm start occurs when a worker is already running and idle:- Worker completed a previous job and is waiting for more work.
- Worker is within its
idle_timeoutperiod.
- Job is routed immediately to the idle worker.
- Your function executes.
The relationship between configuration and starts
Yourworkers and idle_timeout settings directly affect cold start frequency:
workers=(0, n): Workers scale to zero when idle. Every request after idle period triggers a cold start.workers=(1, n): At least one worker stays ready. First concurrent request is warm, additional requests may cold start.- Higher
idle_timeout: Workers stay idle longer before scaling down, reducing cold starts for sporadic traffic.