flash init, you have a working project template with example and . This guide shows you how to customize the template to build your application.
Understanding endpoint architecture
The relationship between endpoint configurations and deployed Serverless endpoints differs between load-balanced and queue-based endpoints. Understanding this mapping is critical for building Flash apps correctly.Key rules
Queue-based endpoints follow a strict 1:1:1 rule:- 1 endpoint configuration : 1
@Endpointfunction : 1 Serverless endpoint. - Each function must have its own unique endpoint name.
- Each endpoint gets its own URL (e.g.,
https://api.runpod.ai/v2/abc123xyz) - Called via
/runor/runsyncroutes.
- 1 endpoint instance = multiple route decorators = 1 Serverless endpoint.
- Multiple routes can share the same endpoint configuration.
- All routes share one URL with different paths (e.g.,
/generate,/health). - Each route defined by
.get(),.post(), etc. method decorators.
Examples
The following sections demonstrate progressively complex scenarios:Scenario 1: A single queue-based endpoint
Scenario 1: A single queue-based endpoint
Your code:What gets deployed:Key takeaway: Each queue-based function must have its own unique endpoint name. Do not reuse the same name for multiple queue-based functions in Flash apps.
gpu_worker.py
- 1 Serverless endpoint:
https://api.runpod.ai/v2/abc123xyz- Named:
gpu-inference - Hardware: A100 80GB GPUs.
- When you call the endpoint: A worker runs the
process_datafunction using the input data you provide.
- Named:
Scenario 2: Multiple queue-based endpoints
Scenario 2: Multiple queue-based endpoints
Your code:What gets deployed:Key takeaway: Each queue-based function must have its own unique endpoint name. Do not reuse the same name for multiple queue-based functions in Flash apps.
gpu_worker.py
- 2 Serverless endpoints:
https://api.runpod.ai/v2/abc123xyz(Named:preprocessin the console)https://api.runpod.ai/v2/def456xyz(Named:inferencein the console)
Scenario 3: Load-balanced endpoint with multiple routes
Scenario 3: Load-balanced endpoint with multiple routes
Your code:What gets deployed:Key takeaway: Load-balanced endpoints can have multiple routes on a single Serverless endpoint. The route decorator determines each route.
lb_worker.py
- 1 Serverless endpoint:
https://abc123xyz.api.runpod.ai(Named:api-server) - 3 HTTP routes:
POST /generate,POST /translate,GET /health(Defined by the route decorators inlb_worker.py)
Scenario 4: Mixing load-balanced and queue-based endpoints
Scenario 4: Mixing load-balanced and queue-based endpoints
Your code:What gets deployed:
mixed_api_worker.py
- 2 Serverless endpoints:
https://abc123xyz.api.runpod.ai(public-api, load-balanced)https://api.runpod.ai/v2/def456xyz(gpu-backend, queue-based)
Quick reference
| Endpoint Type | Configuration rule | Result |
|---|---|---|
| Queue-based | 1 name : 1 function | 1 Serverless endpoint |
| Load-balanced | 1 endpoint : 1 or more routes | 1 Serverless endpoint with >= 1 paths |
| Mixed | Different names : Different functions | Separate Serverless endpoints |
Add load balancing routes
To add routes to an existing load balancing endpoint, use the route decorator pattern:lb_worker.py
lb_worker Serverless endpoint. Each route is accessible at its defined path.
Key points:
- Multiple routes can share one endpoint configuration
- Each route has its own HTTP method and path
- All routes on the same endpoint deploy to one Serverless endpoint
Add queue-based endpoints
To add a new queue-based endpoint, create a new endpoint with a unique name:gpu_worker.py
Modify endpoint configurations
Customize endpoint configurations for each worker function in your app. Each@Endpoint function can have its own GPU type, scaling parameters, and timeouts optimized for its specific workload.
Test your customizations
After customizing your app, test locally withflash run:
- Interactive API documentation at
/docs - Auto-reload on code changes
- Real remote execution on Runpod workers
- All HTTP routes work as expected
- Endpoint functions execute correctly
- Dependencies install properly
- Error handling works