flash deploy to build and deploy your application in a single command, or use flash build for more control over the build process.
Deployment workflow
A typical deployment workflow looks like this:- Create a new project: Use
flash initto create a new project. - Develop locally: Use
flash runto test your application. Any functions decorated with@Endpointwill be run on Runpod Serverless workers. - Preview (optional): Use
flash deploy --previewto test locally with Docker. - Deploy: Use
flash deployto push to Runpod Serverless. - Manage: Use
flash envandflash appto manage your deployments.
Deploy your application
When you’re satisfied with your endpoint functions and ready to move to production, useflash deploy to build and deploy your Flash application:
- Build: Packages your code, dependencies, and manifest.
- Upload: Sends the artifact to Runpod’s storage.
- Provision: Creates or updates Serverless endpoints.
- Configure: Sets up environment variables and service discovery.
Deployment architecture
Flash deploys your application as multiple independent Serverless endpoints. Each endpoint configuration in your worker files becomes a separate endpoint: How Flash deployments work:- One endpoint name = one endpoint: Each unique endpoint configuration (defined by its
nameparameter) creates a separate Serverless endpoint with its own URL. - Call any endpoint: After deployment, you can call whichever endpoint you need—
lb_workerfor API requests,gpu_workerfor GPU tasks,cpu_workerfor CPU tasks. - Load balancing endpoints: Create HTTP APIs with custom routes using
.get(),.post(), etc. decorators. - Queue-based endpoints: Run compute tasks using the
/runsyncor/runroutes. - Inter-endpoint communication: Endpoints can call each other’s functions when needed, using the Runpod GraphQL service for discovery.
Deploy to an environment
Flash organizes deployments using apps and environments. Deploy to a specific environment using the--env flag:
Post-deployment
After a successful deployment, Flash displays all deployed endpoints grouped by type:Understanding endpoint architecture
The relationship between endpoint configurations and deployed endpoints differs between load-balanced and queue-based endpoints:Queue-based endpoints (one function per endpoint)
For queue-based endpoints, each@Endpoint function must have its own unique name:
https://api.runpod.ai/v2/abc123xyz(run-model)https://api.runpod.ai/v2/def456xyz(preprocess)
Load-balanced endpoints (multiple routes per endpoint)
For load-balanced endpoints, you can define multiple HTTP routes on a single endpoint:- One Serverless endpoint:
https://abc123xyz.api.runpod.ai(named “api”) - Three HTTP routes:
POST /generate,POST /translate,GET /health
Key takeaway
- Queue-based: 1 endpoint name = 1 function = 1 Serverless endpoint
- Load-balanced: 1 endpoint instance = multiple routes = 1 Serverless endpoint
Preview before deploying
Test your deployment locally using Docker before pushing to production using the--preview flag:
- Builds your project (creates the deployment artifact and manifest).
- Creates a Docker network for inter-container communication.
- Starts one container per endpoint configuration (
lb_worker,gpu_worker,cpu_worker, etc.). - Exposes all endpoints for local testing.
- Validate your deployment configuration.
- Test cross-endpoint function calls.
- Debug resource provisioning issues.
- Verify the manifest structure.
Ctrl+C to stop the preview environment.
Managing deployment size
Runpod Serverless has a 500MB deployment limit. Flash automatically excludes packages that are pre-installed in the base image:torch,torchvision,torchaudionumpy,triton
--exclude flag to skip additional packages:
Base image packages
| Configuration type | Base image | Auto-excluded packages |
|---|---|---|
GPU (gpu=) | PyTorch base | torch, torchvision, torchaudio, numpy, triton |
CPU (cpu=) | Python slim | torch, torchvision, torchaudio, numpy, triton |
| Load-balanced | Same as GPU/CPU | Same as GPU/CPU |
Build process
When you runflash deploy (or flash build), Flash:
- Discovers all
@Endpointdecorated functions. - Groups functions by their endpoint name.
- Generates handler files for each endpoint.
- Creates a
flash_manifest.jsonfile for service discovery. - Installs dependencies with Linux x86_64 compatibility.
- Packages everything into
.flash/artifact.tar.gz.
Cross-platform builds
Flash automatically handles cross-platform builds. You can build on macOS, Windows, or Linux, and the resulting package will run correctly on Runpod’s Linux x86_64 infrastructure.Build artifacts
After building, these artifacts are created in the.flash/ directory:
| Artifact | Description |
|---|---|
.flash/artifact.tar.gz | Deployment package |
.flash/flash_manifest.json | Service discovery configuration |
.flash/.build/ | Temporary build directory (removed by default) |
What gets deployed to Runpod
When you deploy a Flash app, you’re deploying a build artifact (tarball) onto pre-built Flash Docker images. This architecture is similar to AWS Lambda layers: the base runtime is pre-built, and your code and dependencies are layered on top.The build artifact
The.flash/artifact.tar.gz file (max 500 MB) contains:
artifact.tar.gz
lb_worker.py
gpu_worker.py
cpu_worker.py
flash_manifest.json
requirements.txt
[installed dependencies]
torch
transformers
...
The deployment manifest
Theflash_manifest.json file is the brain of your deployment. It tells each endpoint:
- Which functions to execute.
- What Docker image to use.
- How to configure resources (GPUs, workers, scaling).
- How to route HTTP requests (for load balancer endpoints).
What gets created on Runpod
For each endpoint configuration in the manifest, Flash creates an independent Serverless endpoint. Each endpoint runs as its own service with its own URL. load-balanced endpoints (load balancer)- Purpose: HTTP-facing services for custom API routes
- Image: Pre-built
runpod/flash-lb-cpu:latestorrunpod/flash-lb:latest - Use cases: REST APIs, webhooks, public-facing services
- Example:
lb_worker.pywith@api.post("/process") - Routes: Custom HTTP endpoints defined in your route decorators
- Startup process:
- Container extracts your tarball
- Auto-generated handler imports your worker file (e.g.,
lb_worker.py) - Routes are registered from decorators
- Uvicorn server starts on port 8000
- Service discovery: Queries the state manager for cross-endpoint calls
- Purpose: Background compute for intensive
@Endpointfunctions - Image: Pre-built
runpod/flash:latest(GPU) orrunpod/flash-cpu:latest(CPU) - Use cases: GPU inference, batch processing, heavy computation
- Example:
gpu_worker.pywith@Endpoint(name="...", gpu=...) - Routes: Automatic
/runsyncendpoint for job submission - Startup process:
- Container extracts your tarball
- Worker module is imported (e.g.,
gpu_worker.py) - Function registry maps function names to callables
- Worker listens for jobs from job queue
- Execution: Sequential job processing with automatic retry logic
- Service discovery: Queries the state manager for cross-endpoint calls
Cross-endpoint communication
When one endpoint needs to call a function on another endpoint:- Manifest lookup: Calling endpoint checks
flash_manifest.jsonfor function-to-resource mapping - Service discovery: Queries the state manager (Runpod GraphQL API) for target endpoint URL
- Direct call: Makes HTTP request directly to target endpoint
- Response: Target endpoint executes function and returns result
Troubleshooting
No @Endpoint functions found
If the build process can’t find your endpoint functions:- Ensure functions are decorated with
@Endpoint(...). - Check that Python files aren’t excluded by
.gitignoreor.flashignore. - Verify decorator syntax is correct.
Deployment size limit exceeded
Base image packages are auto-excluded. If your deployment still exceeds 500MB, use--exclude to skip additional packages:
Authentication errors
Verify your API key is set correctly:.env file or export it:
Import errors in endpoint functions
Import packages inside the endpoint function, not at the top of the file:Next steps
- Learn about apps and environments for managing deployments.
- View the CLI reference for all available commands.
- Configure hardware resources for your endpoints.
- Monitor and troubleshoot your deployments.