> ## Documentation Index
> Fetch the complete documentation index at: https://docs.runpod.io/llms.txt
> Use this file to discover all available pages before exploring further.

# Generate images with Flash and SDXL

> Learn how to use Flash with Stable Diffusion XL to generate high-quality images from text prompts.

This tutorial shows you how to build an image generation script using Flash and Stable Diffusion XL (SDXL). You'll learn how to load a pretrained diffusion model on a GPU worker and generate images from text prompts.

<Frame alt="Cool cat image generated by the Public Endpoints text-to-video pipeline">
  <img src="https://mintcdn.com/runpod-b18f5ded/1b_UCwPud6fQ8JaD/images/flash_sdxl_output.png?fit=max&auto=format&n=1b_UCwPud6fQ8JaD&q=85&s=67c6645ea9b00a305707bd5d88ad990f" width="1024" height="1024" data-path="images/flash_sdxl_output.png" />
</Frame>

## Requirements

* You've [created a Runpod account](/get-started/manage-accounts).
* You've [created a Runpod API key](/get-started/api-keys).
* You've installed [Python 3.10, 3.11, 3.12, or 3.13](https://www.python.org/downloads/).
* You've completed the [Flash quickstart](/flash/quickstart) or are familiar with Flash basics.

## What you'll build

By the end of this tutorial, you'll have a working image generation application that:

* Accepts text prompts as input.
* Generates photorealistic images using Stable Diffusion XL.
* Runs entirely on Runpod's GPU infrastructure.
* Saves generated images to your local machine.

## Step 1: Set up your project

Create a new directory for your project and set up a Python virtual environment:

```bash theme={"theme":{"light":"github-light","dark":"github-dark"}}
mkdir flash-image-generation
cd flash-image-generation
```

Install Flash using [uv](https://docs.astral.sh/uv/):

```bash theme={"theme":{"light":"github-light","dark":"github-dark"}}
uv venv
source .venv/bin/activate
uv pip install runpod-flash python-dotenv
```

Create a `.env` file with your Runpod API key:

```bash theme={"theme":{"light":"github-light","dark":"github-dark"}}
touch .env && echo "RUNPOD_API_KEY=YOUR_API_KEY" > .env
```

Replace `YOUR_API_KEY` with your actual API key from the [Runpod console](https://www.runpod.io/console/user/settings).

## Step 2: Understand Stable Diffusion XL

[Stable Diffusion XL (SDXL)](https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0) is a state-of-the-art text-to-image model from Stability AI. It offers:

* **High-quality images**: Generates photorealistic 1024x1024 images
* **Better prompt understanding**: Improved text comprehension compared to SD 1.5
* **Fine details**: Enhanced rendering of hands, faces, and text
* **Open source**: Available for free on Hugging Face

SDXL requires significant GPU resources:

* **Model size**: \~7GB of weights
* **VRAM requirement**: Minimum 16GB (24GB recommended)
* **Generation time**: 20-40 seconds per image on RTX 4090

We'll use the [diffusers](https://huggingface.co/docs/diffusers/index) library from Hugging Face, which provides a clean Python API for Stable Diffusion models.

## Step 3: Create your project file

Create a new file called `image_generation.py`:

```bash theme={"theme":{"light":"github-light","dark":"github-dark"}}
touch image_generation.py
```

Open this file in your code editor. The following steps walk through building the image generation application.

## Step 4: Add imports and configuration

Add the necessary imports and Flash configuration:

```python theme={"theme":{"light":"github-light","dark":"github-dark"}}
import asyncio
import base64
from pathlib import Path
from dotenv import load_dotenv
from runpod_flash import Endpoint, GpuGroup

# Load environment variables from .env file
load_dotenv()
```

## Step 5: Define the image generation function

Add the endpoint function that will run on the GPU worker:

```python theme={"theme":{"light":"github-light","dark":"github-dark"}}
@Endpoint(
    name="image-generation",
    gpu=[GpuGroup.ADA_24, GpuGroup.AMPERE_24],  # 24GB GPUs
    workers=2,
    idle_timeout=900,  # Keep workers active for 15 minutes
    dependencies=["diffusers", "torch", "transformers", "accelerate"]
)
def generate_image(prompt, negative_prompt="", num_steps=30, guidance_scale=7.5):
    """Generate an image using Stable Diffusion XL."""
    import torch
    from diffusers import StableDiffusionXLPipeline
    import base64
    from io import BytesIO

    # Load the SDXL model
    model_id = "stabilityai/stable-diffusion-xl-base-1.0"
    pipe = StableDiffusionXLPipeline.from_pretrained(
        model_id,
        torch_dtype=torch.float16,
        use_safetensors=True,
        variant="fp16"
    )

    # Move model to GPU
    device = "cuda" if torch.cuda.is_available() else "cpu"
    pipe = pipe.to(device)

    # Generate image
    image = pipe(
        prompt=prompt,
        negative_prompt=negative_prompt,
        num_inference_steps=num_steps,
        guidance_scale=guidance_scale,
        height=1024,
        width=1024
    ).images[0]

    # Convert image to base64 for transmission
    buffered = BytesIO()
    image.save(buffered, format="PNG")
    img_str = base64.b64encode(buffered.getvalue()).decode()

    return {
        "image_base64": img_str,
        "prompt": prompt,
        "negative_prompt": negative_prompt,
        "num_steps": num_steps,
        "guidance_scale": guidance_scale,
        "device": device,
        "resolution": "1024x1024"
    }
```

**Configuration breakdown**:

* **`name="image-generation"`**: Identifies your endpoint in the Runpod console.
* **`gpu=[GpuGroup.ADA_24, GpuGroup.AMPERE_24]`**: Uses RTX 4090 or L4/A5000 GPUs (both have 24GB VRAM, sufficient for SDXL).
* **`workers=2`**: Allows up to 2 parallel workers.
* **`idle_timeout=900`**: Keeps workers active for 15 minutes (SDXL models are large, so we want longer caching).

<Note>
  SDXL requires at least 16GB VRAM. Using 24GB GPUs provides comfortable headroom and faster generation.
</Note>

This function:

* Loads the SDXL model from Hugging Face.
* Moves the model to the GPU.
* Generates an image from the prompt.
* Encodes the image as base64.
* Returns the image as a base64 string (and other metadata).

Expand this section for a full breakdown:

<Accordion title="Code breakdown">
  **Dependencies**: The function requires four packages:

  * `diffusers`: Hugging Face library for diffusion models
  * `torch`: PyTorch for GPU computation
  * `transformers`: Text encoder dependencies
  * `accelerate`: Efficient model loading

  **Model loading**:

  ```python theme={"theme":{"light":"github-light","dark":"github-dark"}}
  pipe = StableDiffusionXLPipeline.from_pretrained(
      model_id,
      torch_dtype=torch.float16,
      use_safetensors=True,
      variant="fp16"
  )
  ```

  This downloads SDXL from Hugging Face. Key parameters:

  * `torch_dtype=torch.float16`: Use half-precision (saves VRAM, faster)
  * `use_safetensors=True`: Use safe tensor format
  * `variant="fp16"`: Download the fp16 version (\~7GB instead of \~14GB)

  **GPU acceleration**:

  ```python theme={"theme":{"light":"github-light","dark":"github-dark"}}
  pipe = pipe.to(device)
  ```

  Moves the entire pipeline (text encoder, UNet, VAE) to GPU.

  **Image generation**:

  ```python theme={"theme":{"light":"github-light","dark":"github-dark"}}
  image = pipe(
      prompt=prompt,
      negative_prompt=negative_prompt,
      num_inference_steps=num_steps,
      guidance_scale=guidance_scale,
      height=1024,
      width=1024
  ).images[0]
  ```

  Parameters:

  * **`prompt`**: What you want to see in the image
  * **`negative_prompt`**: What you don't want (e.g., "blurry, low quality")
  * **`num_inference_steps`**: More steps = better quality but slower (20-50 typical)
  * **`guidance_scale`**: How closely to follow the prompt (7-10 recommended)
  * **`height/width`**: SDXL is trained for 1024x1024

  **Image encoding**:

  ```python theme={"theme":{"light":"github-light","dark":"github-dark"}}
  buffered = BytesIO()
  image.save(buffered, format="PNG")
  img_str = base64.b64encode(buffered.getvalue()).decode()
  ```

  We encode the image as base64 to return it through Flash. This allows us to transmit the image data as a string.
</Accordion>

## Step 6: Add the main function and image saving

Create functions to call the generator and save images:

```python theme={"theme":{"light":"github-light","dark":"github-dark"}}
def save_image(base64_string, filename):
    """Save a base64-encoded image to disk."""
    import base64
    from PIL import Image
    from io import BytesIO

    # Decode base64 string
    img_data = base64.b64decode(base64_string)

    # Open and save image
    image = Image.open(BytesIO(img_data))
    image.save(filename)
    print(f"✓ Image saved to {filename}")

async def main():
    print("Generating image with Stable Diffusion XL on Runpod GPU...")
    print("This may take 1-2 minutes on first run (downloading model)...\n")

    # Define your prompt
    prompt = "A serene landscape with mountains, a lake, and sunset, highly detailed, photorealistic"
    negative_prompt = "blurry, low quality, distorted, ugly"

    # Generate image
    result = await generate_image(
        prompt=prompt,
        negative_prompt=negative_prompt,
        num_steps=30,
        guidance_scale=7.5
    )

    # Save the generated image
    output_dir = Path("generated_images")
    output_dir.mkdir(exist_ok=True)

    filename = output_dir / "sdxl_output.png"
    save_image(result["image_base64"], filename)

    # Display metadata
    print(f"\n{'='*60}")
    print("GENERATION DETAILS")
    print('='*60)
    print(f"Prompt: {result['prompt']}")
    print(f"Negative prompt: {result['negative_prompt']}")
    print(f"Steps: {result['num_steps']}")
    print(f"Guidance scale: {result['guidance_scale']}")
    print(f"Resolution: {result['resolution']}")
    print(f"Device: {result['device']}")
    print('='*60)

if __name__ == "__main__":
    asyncio.run(main())
```

This main function:

* Calls the remote function with `await`.
* Creates a `generated_images` directory if it doesn't exist.
* Decodes and saves the base64 image to disk.
* Displays generation metadata.

## Step 7: Run your first generation

Run the application:

```bash theme={"theme":{"light":"github-light","dark":"github-dark"}}
python image_generation.py
```

**First run output** (takes 2-3 minutes):

```text theme={"theme":{"light":"github-light","dark":"github-dark"}}
Generating image with Stable Diffusion XL on Runpod GPU...
This may take 1-2 minutes on first run (downloading model)...

Creating endpoint: server_Endpoint_a1b2c3d4
Provisioning Serverless endpoint...
Endpoint ready
Executing function on RunPod endpoint ID: xvf32dan8rcilp
Initial job status: IN_QUEUE
Downloading model weights from Hugging Face...
Model loaded, generating image...
Job completed, output received
✓ Image saved to generated_images/sdxl_output.png

============================================================
GENERATION DETAILS
============================================================
Prompt: A serene landscape with mountains, a lake, and sunset, highly detailed, photorealistic
Negative prompt: blurry, low quality, distorted, ugly
Steps: 30
Guidance scale: 7.5
Resolution: 1024x1024
Device: cuda
============================================================
```

**Subsequent runs** (takes 30-40 seconds):

```text theme={"theme":{"light":"github-light","dark":"github-dark"}}
Generating image with Stable Diffusion XL on Runpod GPU...

Resource Endpoint_a1b2c3d4 already exists, reusing.
Executing function on RunPod endpoint ID: xvf32dan8rcilp
Initial job status: IN_QUEUE
Job completed, output received
✓ Image saved to generated_images/sdxl_output.png

[Results appear]
```

Open `generated_images/sdxl_output.png` to see your generated image!

<Tip>
  The first run downloads \~7GB of model weights, which takes 1-2 minutes. Subsequent runs reuse the cached model and complete in 30-40 seconds.
</Tip>

## Step 8: Experiment with different prompts

Try various prompts to see SDXL's capabilities:

```python theme={"theme":{"light":"github-light","dark":"github-dark"}}
async def main():
    # Create output directory
    output_dir = Path("generated_images")
    output_dir.mkdir(exist_ok=True)

    # Try different prompts
    prompts = [
        {
            "prompt": "A cyberpunk city at night with neon lights, flying cars, rain, cinematic",
            "negative": "blurry, low quality",
            "filename": "cyberpunk_city.png"
        },
        {
            "prompt": "A cute corgi puppy wearing a space suit, floating in space, highly detailed",
            "negative": "distorted, ugly, bad anatomy",
            "filename": "space_corgi.png"
        },
        {
            "prompt": "An ancient wizard's study filled with books, potions, magical artifacts, candlelight",
            "negative": "blurry, modern, plastic",
            "filename": "wizard_study.png"
        }
    ]

    for i, p in enumerate(prompts, 1):
        print(f"\n{'='*60}")
        print(f"Generating image {i}/{len(prompts)}")
        print(f"Prompt: {p['prompt'][:50]}...")
        print('='*60)

        result = await generate_image(
            prompt=p['prompt'],
            negative_prompt=p['negative'],
            num_steps=30,
            guidance_scale=7.5
        )

        filename = output_dir / p['filename']
        save_image(result["image_base64"], filename)
        print(f"✓ Saved to {filename}\n")

if __name__ == "__main__":
    asyncio.run(main())
```

Run it:

```bash theme={"theme":{"light":"github-light","dark":"github-dark"}}
python image_generation.py
```

You'll see three different images generated sequentially on the same GPU worker. Each generation takes about 30-40 seconds after the first one.

## Understanding generation parameters

Let's explore how different parameters affect image quality:

### Number of inference steps

```python theme={"theme":{"light":"github-light","dark":"github-dark"}}
# Fast but lower quality (15-20 steps)
result = await generate_image(prompt, num_steps=20)

# Balanced (30 steps) - recommended
result = await generate_image(prompt, num_steps=30)

# High quality but slower (50 steps)
result = await generate_image(prompt, num_steps=50)
```

**Effects**:

* **15-20 steps**: Faster (15-20 seconds) but less refined details.
* **30 steps**: Good balance of quality and speed (30-40 seconds) - **recommended**.
* **50+ steps**: Diminishing returns, minimal quality improvement.

### Guidance scale

```python theme={"theme":{"light":"github-light","dark":"github-dark"}}
# Low guidance - more creative, less faithful to prompt
result = await generate_image(prompt, guidance_scale=5.0)

# Medium guidance - balanced (recommended)
result = await generate_image(prompt, guidance_scale=7.5)

# High guidance - very faithful to prompt, may oversaturate
result = await generate_image(prompt, guidance_scale=12.0)
```

**Effects**:

* **3-5**: More artistic freedom, less literal interpretation.
* **7-10**: Balanced, follows prompt closely - **recommended**.
* **12+**: Very literal, may produce oversaturated or exaggerated images.

### Negative prompts

Negative prompts tell the model what to avoid:

```python theme={"theme":{"light":"github-light","dark":"github-dark"}}
# Good negative prompts for photorealistic images
negative_prompt = "blurry, low quality, distorted, ugly, bad anatomy, watermark"

# Good negative prompts for artistic images
negative_prompt = "realistic, photograph, blurry, low quality"

# Good negative prompts for portraits
negative_prompt = "distorted face, bad anatomy, extra limbs, low quality"
```

Use negative prompts to:

* Remove common artifacts ("distorted", "low quality").
* Avoid unwanted styles ("cartoon", "3D render").
* Fix common issues ("bad anatomy", "extra fingers").

## Troubleshooting

### Out of memory error

**Issue**: `RuntimeError: CUDA out of memory`.

**Cause**: SDXL requires significant VRAM (16GB minimum).

**Solutions**:

1. Verify you're using 24GB GPUs:
   ```python theme={"theme":{"light":"github-light","dark":"github-dark"}}
   gpu=[GpuGroup.ADA_24, GpuGroup.AMPERE_24]  # 24GB GPUs
   ```

2. Use half-precision (already in the example):
   ```python theme={"theme":{"light":"github-light","dark":"github-dark"}}
   torch_dtype=torch.float16  # Half precision
   ```

3. If still failing, use 48GB GPUs:
   ```python theme={"theme":{"light":"github-light","dark":"github-dark"}}
   gpu=GpuGroup.AMPERE_48  # A40/A6000 with 48GB
   ```

### Model download fails

**Issue**: `Error: Failed to download model from Hugging Face`.

**Solutions**:

1. Increase execution timeout for first run:
   ```python theme={"theme":{"light":"github-light","dark":"github-dark"}}
   @Endpoint(
       name="image-generation",
       gpu=GpuGroup.ADA_24,
       execution_timeout_ms=600000  # 10 minutes for first download
   )
   ```

2. Check Hugging Face Hub status at [status.huggingface.co](https://status.huggingface.co).

3. Try a smaller model first to test connectivity:
   ```python theme={"theme":{"light":"github-light","dark":"github-dark"}}
   model_id = "runwayml/stable-diffusion-v1-5"  # Smaller SD 1.5
   ```

### Image quality is poor

**Issue**: Generated images look blurry or low quality.

**Solutions**:

1. Increase inference steps:
   ```python theme={"theme":{"light":"github-light","dark":"github-dark"}}
   num_steps=40  # More steps = better quality
   ```

2. Adjust guidance scale:
   ```python theme={"theme":{"light":"github-light","dark":"github-dark"}}
   guidance_scale=8.5  # Higher guidance
   ```

3. Improve your prompt:
   ```python theme={"theme":{"light":"github-light","dark":"github-dark"}}
   prompt = "A detailed portrait, highly detailed, sharp focus, 8k, professional photography"
   ```

4. Add quality keywords to your prompt:
   * "highly detailed"
   * "sharp focus"
   * "8k"
   * "photorealistic"
   * "professional"

### Slow generation

**Issue**: Image generation takes >60 seconds per image.

**Possible causes**:

1. Worker scaled down (cold start).
2. Model not cached.
3. Too many inference steps.

**Solutions**:

1. Increase `idle_timeout` to keep workers active:
   ```python theme={"theme":{"light":"github-light","dark":"github-dark"}}
   idle_timeout=1800  # Keep active for 30 minutes
   ```

2. Reduce inference steps:
   ```python theme={"theme":{"light":"github-light","dark":"github-dark"}}
   num_steps=20  # Faster but slightly lower quality
   ```

3. Set `workers=(1, 2)` to always have a warm worker ready.

### Images look distorted or have artifacts

**Issue**: Generated images have weird artifacts or distortions.

**Solutions**:

1. Use negative prompts:
   ```python theme={"theme":{"light":"github-light","dark":"github-dark"}}
   negative_prompt="distorted, ugly, bad anatomy, extra limbs, disfigured"
   ```

2. Adjust guidance scale (try 7-9 range):
   ```python theme={"theme":{"light":"github-light","dark":"github-dark"}}
   guidance_scale=8.0
   ```

3. Increase inference steps for better refinement:
   ```python theme={"theme":{"light":"github-light","dark":"github-dark"}}
   num_steps=35
   ```

## Next steps

Now that you've built an image generation script with Flash, you can:

### Try other Stable Diffusion models

Explore different models from Hugging Face:

```python theme={"theme":{"light":"github-light","dark":"github-dark"}}
# SDXL Turbo - 4x faster, 1 step generation
model_id = "stabilityai/sdxl-turbo"

# Stable Diffusion 1.5 - smaller, faster
model_id = "runwayml/stable-diffusion-v1-5"

# Stable Diffusion 2.1 - better at artistic styles
model_id = "stabilityai/stable-diffusion-2-1"
```

### Add image-to-image generation

Use an existing image as a starting point:

```python theme={"theme":{"light":"github-light","dark":"github-dark"}}
from diffusers import StableDiffusionXLImg2ImgPipeline

# Load img2img pipeline
pipe = StableDiffusionXLImg2ImgPipeline.from_pretrained(...)

# Generate variations of an existing image
image = pipe(prompt, image=init_image, strength=0.75).images[0]
```

### Build a Flash app

Convert your script to a production [Flash app](/flash/apps/overview):

```bash theme={"theme":{"light":"github-light","dark":"github-dark"}}
flash init image-generation-app
# Move your function to workers/gpu/endpoint.py
# Add FastAPI routes for HTTP API
flash deploy

# If using uv:
uv run flash init image-generation-app
uv run flash deploy
```

### Optimize with network volumes

Use [network volumes](/flash/configuration/storage) to cache models across workers:

```python theme={"theme":{"light":"github-light","dark":"github-dark"}}
from runpod_flash import Endpoint, GpuGroup, NetworkVolume

vol = NetworkVolume(name="model-cache")  # Finds existing or creates new

@Endpoint(
    name="image-generation",
    gpu=GpuGroup.ADA_24,
    volume=vol,
    dependencies=["diffusers", "torch", "transformers", "accelerate"]
)
def generate_image(prompt, ...):
    # Models at /runpod-volume/ persist across workers
    ...
```

### Explore advanced features

* **LoRA fine-tuning**: Customize SDXL for specific styles.
* **ControlNet**: Guide generation with edge maps, depth, or pose.
* **Inpainting**: Edit specific parts of images.
* **Upscaling**: Generate higher resolution images.

## Related resources

* [Endpoint functions guide](/flash/create-endpoints).
* [Configuration reference](/flash/configuration/parameters).
* [Stable Diffusion XL model card](https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0).
