Generate images with Flash and SDXL

This tutorial shows you how to build an image generation script using Flash and Stable Diffusion XL (SDXL). You’ll learn how to load a pretrained diffusion model on a GPU worker and generate images from text prompts.

Requirements

You’ve created a Runpod account.
You’ve created a Runpod API key.
You’ve installed Python 3.10, 3.11, 3.12, or 3.13.
You’ve completed the Flash quickstart or are familiar with Flash basics.

What you’ll build

By the end of this tutorial, you’ll have a working image generation application that:

Accepts text prompts as input.
Generates photorealistic images using Stable Diffusion XL.
Runs entirely on Runpod’s GPU infrastructure.
Saves generated images to your local machine.

Step 1: Set up your project

Create a new directory for your project and set up a Python virtual environment:

mkdir flash-image-generation
cd flash-image-generation

Install Flash using uv:

uv venv
source .venv/bin/activate
uv pip install runpod-flash python-dotenv

Create a .env file with your Runpod API key:

touch .env && echo "RUNPOD_API_KEY=YOUR_API_KEY" > .env

Replace YOUR_API_KEY with your actual API key from the Runpod console.

Step 2: Understand Stable Diffusion XL

Stable Diffusion XL (SDXL) is a state-of-the-art text-to-image model from Stability AI. It offers:

High-quality images: Generates photorealistic 1024x1024 images
Better prompt understanding: Improved text comprehension compared to SD 1.5
Fine details: Enhanced rendering of hands, faces, and text
Open source: Available for free on Hugging Face

SDXL requires significant GPU resources:

Model size: ~7GB of weights
VRAM requirement: Minimum 16GB (24GB recommended)
Generation time: 20-40 seconds per image on RTX 4090

We’ll use the diffusers library from Hugging Face, which provides a clean Python API for Stable Diffusion models.

Step 3: Create your project file

Create a new file called image_generation.py:

touch image_generation.py

Open this file in your code editor. The following steps walk through building the image generation application.

Step 4: Add imports and configuration

Add the necessary imports and Flash configuration:

import asyncio
import base64
from pathlib import Path
from dotenv import load_dotenv
from runpod_flash import Endpoint, GpuGroup

# Load environment variables from .env file
load_dotenv()

Step 5: Define the image generation function

Add the endpoint function that will run on the GPU worker:

@Endpoint(
    name="image-generation",
    gpu=[GpuGroup.ADA_24, GpuGroup.AMPERE_24],  # 24GB GPUs
    workers=2,
    idle_timeout=900,  # Keep workers active for 15 minutes
    dependencies=["diffusers", "torch", "transformers", "accelerate"]
)
def generate_image(prompt, negative_prompt="", num_steps=30, guidance_scale=7.5):
    """Generate an image using Stable Diffusion XL."""
    import torch
    from diffusers import StableDiffusionXLPipeline
    import base64
    from io import BytesIO

    # Load the SDXL model
    model_id = "stabilityai/stable-diffusion-xl-base-1.0"
    pipe = StableDiffusionXLPipeline.from_pretrained(
        model_id,
        torch_dtype=torch.float16,
        use_safetensors=True,
        variant="fp16"
    )

    # Move model to GPU
    device = "cuda" if torch.cuda.is_available() else "cpu"
    pipe = pipe.to(device)

    # Generate image
    image = pipe(
        prompt=prompt,
        negative_prompt=negative_prompt,
        num_inference_steps=num_steps,
        guidance_scale=guidance_scale,
        height=1024,
        width=1024
    ).images[0]

    # Convert image to base64 for transmission
    buffered = BytesIO()
    image.save(buffered, format="PNG")
    img_str = base64.b64encode(buffered.getvalue()).decode()

    return {
        "image_base64": img_str,
        "prompt": prompt,
        "negative_prompt": negative_prompt,
        "num_steps": num_steps,
        "guidance_scale": guidance_scale,
        "device": device,
        "resolution": "1024x1024"
    }

Configuration breakdown:

name="image-generation": Identifies your endpoint in the Runpod console.
gpu=[GpuGroup.ADA_24, GpuGroup.AMPERE_24]: Uses RTX 4090 or L4/A5000 GPUs (both have 24GB VRAM, sufficient for SDXL).
workers=2: Allows up to 2 parallel workers.
idle_timeout=900: Keeps workers active for 15 minutes (SDXL models are large, so we want longer caching).

SDXL requires at least 16GB VRAM. Using 24GB GPUs provides comfortable headroom and faster generation.

This function:

Loads the SDXL model from Hugging Face.
Moves the model to the GPU.
Generates an image from the prompt.
Encodes the image as base64.
Returns the image as a base64 string (and other metadata).

Expand this section for a full breakdown:

Code breakdown

Dependencies: The function requires four packages:

diffusers: Hugging Face library for diffusion models
torch: PyTorch for GPU computation
transformers: Text encoder dependencies
accelerate: Efficient model loading

Model loading:

pipe = StableDiffusionXLPipeline.from_pretrained(
    model_id,
    torch_dtype=torch.float16,
    use_safetensors=True,
    variant="fp16"
)

This downloads SDXL from Hugging Face. Key parameters:

torch_dtype=torch.float16: Use half-precision (saves VRAM, faster)
use_safetensors=True: Use safe tensor format
variant="fp16": Download the fp16 version (~7GB instead of ~14GB)

GPU acceleration:

pipe = pipe.to(device)

Moves the entire pipeline (text encoder, UNet, VAE) to GPU.Image generation:

image = pipe(
    prompt=prompt,
    negative_prompt=negative_prompt,
    num_inference_steps=num_steps,
    guidance_scale=guidance_scale,
    height=1024,
    width=1024
).images[0]

Parameters:

prompt: What you want to see in the image
negative_prompt: What you don’t want (e.g., “blurry, low quality”)
num_inference_steps: More steps = better quality but slower (20-50 typical)
guidance_scale: How closely to follow the prompt (7-10 recommended)
height/width: SDXL is trained for 1024x1024

Image encoding:

buffered = BytesIO()
image.save(buffered, format="PNG")
img_str = base64.b64encode(buffered.getvalue()).decode()

We encode the image as base64 to return it through Flash. This allows us to transmit the image data as a string.

Step 6: Add the main function and image saving

Create functions to call the generator and save images:

def save_image(base64_string, filename):
    """Save a base64-encoded image to disk."""
    import base64
    from PIL import Image
    from io import BytesIO

    # Decode base64 string
    img_data = base64.b64decode(base64_string)

    # Open and save image
    image = Image.open(BytesIO(img_data))
    image.save(filename)
    print(f"✓ Image saved to {filename}")

async def main():
    print("Generating image with Stable Diffusion XL on Runpod GPU...")
    print("This may take 1-2 minutes on first run (downloading model)...\n")

    # Define your prompt
    prompt = "A serene landscape with mountains, a lake, and sunset, highly detailed, photorealistic"
    negative_prompt = "blurry, low quality, distorted, ugly"

    # Generate image
    result = await generate_image(
        prompt=prompt,
        negative_prompt=negative_prompt,
        num_steps=30,
        guidance_scale=7.5
    )

    # Save the generated image
    output_dir = Path("generated_images")
    output_dir.mkdir(exist_ok=True)

    filename = output_dir / "sdxl_output.png"
    save_image(result["image_base64"], filename)

    # Display metadata
    print(f"\n{'='*60}")
    print("GENERATION DETAILS")
    print('='*60)
    print(f"Prompt: {result['prompt']}")
    print(f"Negative prompt: {result['negative_prompt']}")
    print(f"Steps: {result['num_steps']}")
    print(f"Guidance scale: {result['guidance_scale']}")
    print(f"Resolution: {result['resolution']}")
    print(f"Device: {result['device']}")
    print('='*60)

if __name__ == "__main__":
    asyncio.run(main())

This main function:

Calls the remote function with await.
Creates a generated_images directory if it doesn’t exist.
Decodes and saves the base64 image to disk.
Displays generation metadata.

Step 7: Run your first generation

Run the application:

python image_generation.py

First run output (takes 2-3 minutes):

Generating image with Stable Diffusion XL on Runpod GPU...
This may take 1-2 minutes on first run (downloading model)...

Creating endpoint: server_Endpoint_a1b2c3d4
Provisioning Serverless endpoint...
Endpoint ready
Executing function on RunPod endpoint ID: xvf32dan8rcilp
Initial job status: IN_QUEUE
Downloading model weights from Hugging Face...
Model loaded, generating image...
Job completed, output received
✓ Image saved to generated_images/sdxl_output.png

============================================================
GENERATION DETAILS
============================================================
Prompt: A serene landscape with mountains, a lake, and sunset, highly detailed, photorealistic
Negative prompt: blurry, low quality, distorted, ugly
Steps: 30
Guidance scale: 7.5
Resolution: 1024x1024
Device: cuda
============================================================

Subsequent runs (takes 30-40 seconds):

Generating image with Stable Diffusion XL on Runpod GPU...

Resource Endpoint_a1b2c3d4 already exists, reusing.
Executing function on RunPod endpoint ID: xvf32dan8rcilp
Initial job status: IN_QUEUE
Job completed, output received
✓ Image saved to generated_images/sdxl_output.png

[Results appear]

Open generated_images/sdxl_output.png to see your generated image!

The first run downloads ~7GB of model weights, which takes 1-2 minutes. Subsequent runs reuse the cached model and complete in 30-40 seconds.

Step 8: Experiment with different prompts

Try various prompts to see SDXL’s capabilities:

async def main():
    # Create output directory
    output_dir = Path("generated_images")
    output_dir.mkdir(exist_ok=True)

    # Try different prompts
    prompts = [
        {
            "prompt": "A cyberpunk city at night with neon lights, flying cars, rain, cinematic",
            "negative": "blurry, low quality",
            "filename": "cyberpunk_city.png"
        },
        {
            "prompt": "A cute corgi puppy wearing a space suit, floating in space, highly detailed",
            "negative": "distorted, ugly, bad anatomy",
            "filename": "space_corgi.png"
        },
        {
            "prompt": "An ancient wizard's study filled with books, potions, magical artifacts, candlelight",
            "negative": "blurry, modern, plastic",
            "filename": "wizard_study.png"
        }
    ]

    for i, p in enumerate(prompts, 1):
        print(f"\n{'='*60}")
        print(f"Generating image {i}/{len(prompts)}")
        print(f"Prompt: {p['prompt'][:50]}...")
        print('='*60)

        result = await generate_image(
            prompt=p['prompt'],
            negative_prompt=p['negative'],
            num_steps=30,
            guidance_scale=7.5
        )

        filename = output_dir / p['filename']
        save_image(result["image_base64"], filename)
        print(f"✓ Saved to {filename}\n")

if __name__ == "__main__":
    asyncio.run(main())

Run it:

python image_generation.py

You’ll see three different images generated sequentially on the same GPU worker. Each generation takes about 30-40 seconds after the first one.

Understanding generation parameters

Let’s explore how different parameters affect image quality:

Number of inference steps

# Fast but lower quality (15-20 steps)
result = await generate_image(prompt, num_steps=20)

# Balanced (30 steps) - recommended
result = await generate_image(prompt, num_steps=30)

# High quality but slower (50 steps)
result = await generate_image(prompt, num_steps=50)

Effects:

15-20 steps: Faster (15-20 seconds) but less refined details.
30 steps: Good balance of quality and speed (30-40 seconds) - recommended.
50+ steps: Diminishing returns, minimal quality improvement.

Guidance scale

# Low guidance - more creative, less faithful to prompt
result = await generate_image(prompt, guidance_scale=5.0)

# Medium guidance - balanced (recommended)
result = await generate_image(prompt, guidance_scale=7.5)

# High guidance - very faithful to prompt, may oversaturate
result = await generate_image(prompt, guidance_scale=12.0)

Effects:

3-5: More artistic freedom, less literal interpretation.
7-10: Balanced, follows prompt closely - recommended.
12+: Very literal, may produce oversaturated or exaggerated images.

Negative prompts

Negative prompts tell the model what to avoid:

# Good negative prompts for photorealistic images
negative_prompt = "blurry, low quality, distorted, ugly, bad anatomy, watermark"

# Good negative prompts for artistic images
negative_prompt = "realistic, photograph, blurry, low quality"

# Good negative prompts for portraits
negative_prompt = "distorted face, bad anatomy, extra limbs, low quality"

Use negative prompts to:

Remove common artifacts (“distorted”, “low quality”).
Avoid unwanted styles (“cartoon”, “3D render”).
Fix common issues (“bad anatomy”, “extra fingers”).

Troubleshooting

Out of memory error

Issue: RuntimeError: CUDA out of memory. Cause: SDXL requires significant VRAM (16GB minimum). Solutions:

Verify you’re using 24GB GPUs:

gpu=[GpuGroup.ADA_24, GpuGroup.AMPERE_24]  # 24GB GPUs

Use half-precision (already in the example):

torch_dtype=torch.float16  # Half precision

If still failing, use 48GB GPUs:

gpu=GpuGroup.AMPERE_48  # A40/A6000 with 48GB

Model download fails

Issue: Error: Failed to download model from Hugging Face. Solutions:

Increase execution timeout for first run:

@Endpoint(
    name="image-generation",
    gpu=GpuGroup.ADA_24,
    execution_timeout_ms=600000  # 10 minutes for first download
)

Check Hugging Face Hub status at status.huggingface.co.

Try a smaller model first to test connectivity:

model_id = "runwayml/stable-diffusion-v1-5"  # Smaller SD 1.5

Image quality is poor

Issue: Generated images look blurry or low quality. Solutions:

Increase inference steps:

num_steps=40  # More steps = better quality

Adjust guidance scale:
```
guidance_scale=8.5  # Higher guidance
```

Improve your prompt:

prompt = "A detailed portrait, highly detailed, sharp focus, 8k, professional photography"

Add quality keywords to your prompt:
- “highly detailed”
- “sharp focus”
- “8k”
- “photorealistic”
- “professional”

Slow generation

Issue: Image generation takes >60 seconds per image. Possible causes:

Worker scaled down (cold start).
Model not cached.
Too many inference steps.

Solutions:

Increase idle_timeout to keep workers active:

idle_timeout=1800  # Keep active for 30 minutes

Reduce inference steps:

num_steps=20  # Faster but slightly lower quality

Set workers=(1, 2) to always have a warm worker ready.

Images look distorted or have artifacts

Issue: Generated images have weird artifacts or distortions. Solutions:

Use negative prompts:

negative_prompt="distorted, ugly, bad anatomy, extra limbs, disfigured"

Adjust guidance scale (try 7-9 range):
```
guidance_scale=8.0
```
Increase inference steps for better refinement:
```
num_steps=35
```

Next steps

Now that you’ve built an image generation script with Flash, you can:

Try other Stable Diffusion models

Explore different models from Hugging Face:

# SDXL Turbo - 4x faster, 1 step generation
model_id = "stabilityai/sdxl-turbo"

# Stable Diffusion 1.5 - smaller, faster
model_id = "runwayml/stable-diffusion-v1-5"

# Stable Diffusion 2.1 - better at artistic styles
model_id = "stabilityai/stable-diffusion-2-1"

Add image-to-image generation

Use an existing image as a starting point:

from diffusers import StableDiffusionXLImg2ImgPipeline

# Load img2img pipeline
pipe = StableDiffusionXLImg2ImgPipeline.from_pretrained(...)

# Generate variations of an existing image
image = pipe(prompt, image=init_image, strength=0.75).images[0]

Build a Flash app

Convert your script to a production Flash app:

flash init image-generation-app
# Move your function to workers/gpu/endpoint.py
# Add FastAPI routes for HTTP API
flash deploy

# If using uv:
uv run flash init image-generation-app
uv run flash deploy

Optimize with network volumes

Use network volumes to cache models across workers:

from runpod_flash import Endpoint, GpuGroup, NetworkVolume

vol = NetworkVolume(name="model-cache")  # Finds existing or creates new

@Endpoint(
    name="image-generation",
    gpu=GpuGroup.ADA_24,
    volume=vol,
    dependencies=["diffusers", "torch", "transformers", "accelerate"]
)
def generate_image(prompt, ...):
    # Models at /runpod-volume/ persist across workers
    ...

Explore advanced features

LoRA fine-tuning: Customize SDXL for specific styles.
ControlNet: Guide generation with edge maps, depth, or pose.
Inpainting: Edit specific parts of images.
Upscaling: Generate higher resolution images.

​Requirements

​What you’ll build

​Step 1: Set up your project

​Step 2: Understand Stable Diffusion XL

​Step 3: Create your project file

​Step 4: Add imports and configuration

​Step 5: Define the image generation function

​Step 6: Add the main function and image saving

​Step 7: Run your first generation

​Step 8: Experiment with different prompts

​Understanding generation parameters

​Number of inference steps

​Guidance scale

​Negative prompts

​Troubleshooting

​Out of memory error

​Model download fails

​Image quality is poor

​Slow generation

​Images look distorted or have artifacts

​Next steps

​Try other Stable Diffusion models

​Add image-to-image generation

​Build a Flash app

​Optimize with network volumes

​Explore advanced features

​Related resources

Requirements

What you’ll build

Step 1: Set up your project

Step 2: Understand Stable Diffusion XL

Step 3: Create your project file

Step 4: Add imports and configuration

Step 5: Define the image generation function

Step 6: Add the main function and image saving

Step 7: Run your first generation

Step 8: Experiment with different prompts

Understanding generation parameters

Number of inference steps

Guidance scale

Negative prompts

Troubleshooting

Out of memory error

Model download fails

Image quality is poor

Slow generation

Images look distorted or have artifacts

Next steps

Try other Stable Diffusion models

Add image-to-image generation

Build a Flash app

Optimize with network volumes

Explore advanced features

Related resources