Available models
The following models are currently available:| Model | Description | Endpoint URL | Type | Price |
|---|---|---|---|---|
| IBM Granite-4.0-H-Small | A 32B parameter long-context instruct model. | https://api.runpod.ai/v2/granite-4-0-h-small/ | Text | $0.01 per 1000 tokens |
| Qwen3 32B AWQ | The latest LLM in the Qwen series, offering advancements in reasoning, instruction-following, agent capabilities, and multilingual support. | https://api.runpod.ai/v2/qwen3-32b-awq/ | Text | $0.01 per 1000 tokens |
| Flux Dev | Offers exceptional prompt adherence, high visual fidelity, and rich image detail. | https://api.runpod.ai/v2/black-forest-labs-flux-1-dev/ | Image | $.02 per megapixel |
| Flux Schnell | Fastest and most lightweight FLUX model, ideal for local development, prototyping, and personal use. | https://api.runpod.ai/v2/black-forest-labs-flux-1-schnell/ | Image | $.0024 per megapixel |
| Flux Kontext Dev | A 12 billion parameter rectified flow transformer capable of editing images based on text instructions. | https://api.runpod.ai/v2/black-forest-labs-flux-1-kontext-dev/ | Image | $0.03 per megapixel |
| Qwen Image | Image generation foundation model with advanced text rendering. | https://api.runpod.ai/v2/qwen-image-t2i/ | Image | $0.02 per megapixel |
| Qwen Image LoRA | Image generation with LoRA support and advanced text rendering. | https://api.runpod.ai/v2/qwen-image-t2i-lora/ | Image | $0.02 per megapixel |
| Qwen Image Edit | Image editing with unique text rendering capabilities. | https://api.runpod.ai/v2/qwen-image-edit/ | Image | $0.02 per megapixel |
| Seedream 4.0 T2I | New-generation image creation with unified generation and editing architecture. | https://api.runpod.ai/v2/seedream-v4-t2i/ | Image | $0.027 per megapixel |
| Seedream 4.0 Edit | New-generation image editing with unified generation and editing architecture. | https://api.runpod.ai/v2/seedream-v4-edit/ | Image | $0.027 per megapixel |
| Seedream 3.0 | Native high-resolution bilingual image generation (Chinese-English). | https://api.runpod.ai/v2/seedream-3-0-t2i/ | Image | $0.03 per megapixel |
| Nano Banana Edit | Google’s state-of-the-art image editing model. | https://api.runpod.ai/v2/nano-banana-edit/ | Image | $0.027 per megapixel |
| InfiniteTalk | Audio-driven video generation model that creates talking or singing videos from a single image and audio input. | https://api.runpod.ai/v2/infinitetalk/ | Video | $0.25 per video generation |
| Kling v2.1 I2V Pro | Professional-grade image-to-video with enhanced visual fidelity. | https://api.runpod.ai/v2/kling-v2-1-i2v-pro/ | Video | $0.36 per 5 seconds of video |
| Seedance 1.0 Pro | High-performance video generation with multi-shot storytelling. | https://api.runpod.ai/v2/seedance-1-0-pro/ | Video | $0.62 per 5 seconds of video |
| SORA 2 I2V | OpenAI’s Sora 2 is a video and audio generation model. | https://api.runpod.ai/v2/sora-2-i2v/ | Video | $0.40 per video generation |
| SORA 2 Pro I2V | OpenAI’s Sora 2 Pro is a professional-grade video and audio generation model. | https://api.runpod.ai/v2/sora-2-pro-i2v/ | Video | $1.20 per video generation |
| WAN 2.5 | Image-to-video generation model. | https://api.runpod.ai/v2/wan-2-5/ | Video | $0.50 per 5 seconds of video |
| WAN 2.2 I2V 720p LoRA | Open-source video generation with LoRA support. | https://api.runpod.ai/v2/wan-2-2-t2v-720-lora/ | Video | $0.35 per 5 seconds of video |
| WAN 2.2 I2V 720p | Open-source AI video generation model that uses a diffusion transformer architecture for image-to-video generation. | https://api.runpod.ai/v2/wan-2-2-i2v-720/ | Video | $0.30 per 5 seconds of video |
| WAN 2.2 T2V 720p | Open-source AI video generation model that uses a diffusion transformer architecture for text-to-video generation. | https://api.runpod.ai/v2/wan-2-2-t2v-720/ | Video | $0.30 per 5 seconds of video |
| WAN 2.1 I2V 720p | Open-source AI video generation model that uses a diffusion transformer architecture for image-to-video generation. | https://api.runpod.ai/v2/wan-2-1-i2v-720/ | Video | $0.30 per 5 seconds of video |
| WAN 2.1 T2V 720p | Open-source AI video generation model that uses a diffusion transformer architecture for text-to-video generation. | https://api.runpod.ai/v2/wan-2-1-t2v-720/ | Video | $0.30 per 5 seconds of video |
| Kling v2.1 I2V Pro | Professional-grade image-to-video with enhanced visual fidelity. | https://api.runpod.ai/v2/kling-v2-1-i2v-pro/ | Video | $0.36 per 5 seconds of video |
| Whisper V3 Large | State-of-the-art automatic speech recognition. | https://api.runpod.ai/v2/whisper-v3-large/ | Audio | $0.05 per 1000 characters of audio transcribed |
| Minimax Speech 02 HD | High-definition text-to-speech model. | https://api.runpod.ai/v2/minimax-speech-02-hd/ | Audio | $0.05 per 1000 characters of audio generated |
Model-specific parameters
Each Public Endpoint accepts a different set of parameters to control the generation process.Flux Dev
Flux Dev is optimized for high-quality, detailed image generation. The model accepts several parameters to control the generation process:| Parameter | Type | Required | Default | Range | Description |
|---|---|---|---|---|---|
prompt | string | Yes | - | - | Text description of the desired image. |
negative_prompt | string | No | - | - | Elements to exclude from the image. |
width | integer | No | 1024 | 256-1536 | Image width in pixels. Must be divisible by 64. |
height | integer | No | 1024 | 256-1536 | Image height in pixels. Must be divisible by 64. |
num_inference_steps | integer | No | 28 | 1-50 | Number of denoising steps. |
guidance | float | No | 7.5 | 0.0-10.0 | How closely to follow the prompt. |
seed | integer | No | -1 | - | Provide a seed for reproducible results. The default value (-1) will generate a random seed. |
image_format | string | No | ”jpeg" | "png” or “jpeg” | Output format. |
Flux Schnell
Flux Schnell is optimized for speed and real-time applications:| Parameter | Type | Required | Default | Range | Description |
|---|---|---|---|---|---|
prompt | string | Yes | - | - | Text description of the desired image. |
negative_prompt | string | No | - | - | Elements to exclude from the image. |
width | integer | No | 1024 | 256-1536 | Image width in pixels. Must be divisible by 64. |
height | integer | No | 1024 | 256-1536 | Image height in pixels. Must be divisible by 64. |
num_inference_steps | integer | No | 4 | 1-8 | Number of denoising steps. |
guidance | float | No | 7.5 | 0.0-10.0 | How closely to follow the prompt. |
seed | integer | No | -1 | - | Provide a seed for reproducible results. The default value (-1) will generate a random seed. |
image_format | string | No | ”jpeg" | "png” or “jpeg” | Output format. |
Flux Schnell is optimized for speed and works best with lower step counts. Using higher values may not improve quality significantly.
IBM Granite-4.0-H-Small
IBM Granite-4.0-H-Small is a 32B parameter long-context instruct model.| Parameter | Type | Required | Default | Range | Description |
|---|---|---|---|---|---|
messages | array | Yes | - | - | Array of message objects with role and content. |
sampling_params.max_tokens | integer | No | 512 | - | Maximum number of tokens to generate. |
sampling_params.temperature | float | No | 0.7 | 0.0-1.0 | Controls randomness in generation. Lower values make output more deterministic. |
sampling_params.seed | integer | No | -1 | - | Seed for reproducible results. The default value (-1) will generate a random seed. |
sampling_params.top_k | integer | No | -1 | - | Restricts sampling to the top K most probable tokens. |
sampling_params.top_p | float | No | 1 | 0.0-1.0 | Nucleus sampling threshold. |
Qwen3 32B AWQ
Qwen3 is the latest generation of large language models in Qwen series, offering a comprehensive suite of dense and mixture-of-experts (MoE) models.| Parameter | Type | Required | Default | Range | Description |
|---|---|---|---|---|---|
prompt | string | Yes | - | - | Prompt for text generation. |
max_tokens | integer | No | 512 | - | Maximum number of tokens to output. |
temperature | float | No | 0.7 | 0.0 - 1.0 | Randomness of the output. Lower temperature makes the output more predictable and deterministic. |
top_p | integer | No | - | Samples from the smallest set of words whose cumulative probability exceeds a given threshold (P). | |
top_k | integer | No | - | 1-8 | Restricts sampling to the top K most probable words. |
stop | string | No | - | - | Stops generation if the given string is encountered. |
OpenAI API request example
OpenAI API request example
OpenAI API streaming example
OpenAI API streaming example
You can stream responses from the OpenAI API using the
stream and stream_options parameters:stream_options={"include_usage": True} is required for streaming to work with vLLM Public Endpoints.Response format
Response format
Qwen Image
Qwen Image is an image generation foundation model with advanced text rendering capabilities.| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
prompt | string | Yes | - | Text description of the desired image. |
negative_prompt | string | No | - | Elements to exclude from the image. |
size | string | No | ”1024*1024” | Image dimensions. |
seed | integer | No | -1 | Seed for reproducible results. -1 generates a random seed. |
enable_safety_checker | boolean | No | true | Enable content safety checking. |
Qwen Image LoRA
Qwen Image with LoRA support allows you to customize generation with fine-tuned LoRA models.| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
prompt | string | Yes | - | Text description of the desired image. |
loras | array | No | [] | Array of LoRA configurations to apply. |
loras[].path | string | Yes | - | URL or path to the LoRA model file. |
loras[].scale | number | Yes | - | Scale factor for the LoRA influence (typically 0-1). |
size | string | No | ”1024*1024” | Image dimensions. |
seed | integer | No | -1 | Seed for reproducible results. -1 generates a random seed. |
enable_safety_checker | boolean | No | true | Enable content safety checking. |
Seedream 3.0
Seedream 3.0 is a native high-resolution bilingual image generation model supporting both Chinese and English prompts.| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
prompt | string | Yes | - | Text description of the desired image. |
seed | integer | No | -1 | Seed for reproducible results. -1 generates a random seed. |
guidance | number | No | 2 | Guidance scale for generation control. |
size | string | No | ”1024x1024” | Image dimensions. |
Seedream 4.0 T2I
Seedream 4.0 is a new-generation image creation model that integrates both generation and editing capabilities.| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
prompt | string | Yes | - | Text description of the desired image. |
negative_prompt | string | No | - | Elements to exclude from the image. |
size | string | No | ”1024*1024” | Image dimensions. |
seed | integer | No | -1 | Seed for reproducible results. -1 generates a random seed. |
enable_safety_checker | boolean | No | true | Enable content safety checking. |
Nano Banana Edit
Google’s Nano Banana Edit is a state-of-the-art image editing model that combines multiple source images.| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
prompt | string | Yes | - | Editing instructions describing the desired transformation. |
images | array | Yes | - | Array of image URLs to edit or combine. |
enable_safety_checker | boolean | No | true | Enable content safety checking. |
Qwen Image Edit
Qwen Image Edit extends the text rendering capabilities to image editing tasks, enabling precise text editing.| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
prompt | string | Yes | - | Editing instructions describing the desired changes. |
negative_prompt | string | No | - | Elements to exclude from the edited image. |
seed | integer | No | -1 | Seed for reproducible results. -1 generates a random seed. |
image | string | Yes | - | URL of the image to edit. |
output_format | string | No | ”jpeg” | Output format (“png” or “jpeg”). |
enable_safety_checker | boolean | No | true | Enable content safety checking. |
Seedream 4.0 Edit
Seedream 4.0 Edit provides advanced image editing capabilities with the same unified architecture as Seedream 4.0 T2I.| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
prompt | string | Yes | - | Editing instructions describing the desired transformation. |
images | array | Yes | - | Array of image URLs to edit or combine. |
size | string | No | ”1024*1024” | Output image dimensions. |
enable_safety_checker | boolean | No | true | Enable content safety checking. |
InfiniteTalk
InfiniteTalk is an audio-driven video generation model that creates talking or singing videos from a single image and audio input.| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
prompt | string | Yes | - | Text description of the desired video. |
image | string | Yes | - | URL of the source image to animate. |
audio | string | Yes | - | URL of the audio file to drive the animation. |
size | enum | Yes | ”480p” | Output video resolution. Valid options are 480p and 720p. |
enable_safety_checker | boolean | No | true | Enable content safety checking. |
Kling v2.1 I2V Pro
Kling 2.1 Pro generates videos from static images with additional control parameters.| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
prompt | string | Yes | - | Text description of the desired video. |
image | string | Yes | - | URL of the source image to animate. |
negative_prompt | string | No | - | Elements to exclude from the video. |
guidance_scale | float | No | 0.5 | How closely to follow the prompt. |
duration | integer | No | 5 | Video duration in seconds. |
enable_safety_checker | boolean | No | true | Enable content safety checking. |
Seedance 1.0 Pro
Seedance 1.0 Pro is a high-performance video generation model with multi-shot storytelling capabilities.| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
prompt | string | Yes | - | Text description of the desired video scene. |
duration | integer | No | 5 | Video duration in seconds. |
fps | integer | No | 24 | Frames per second for the output video. |
size | string | No | ”1920x1080” | Video dimensions. |
image | string | No | "" | Optional source image URL for image-to-video generation. |
SORA 2 I2V
OpenAI’s Sora 2 is a video and audio generation model.| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
prompt | string | Yes | - | Text description of the desired video, including action, ambient sound, and character dialogue. |
image | string | Yes | - | URL of the source image to animate. |
duration | integer | Yes | 4 | Video duration in seconds. Valid options: 4, 8, or 12. |
SORA 2 Pro I2V
OpenAI’s Sora 2 Pro is a professional-grade video and audio generation model.| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
prompt | string | Yes | - | Text description of the desired video, including action, ambient sound, and character dialogue. |
image | string | Yes | - | URL of the source image to animate. |
size | string | No | ”720p” | Output video resolution. |
duration | integer | Yes | 4 | Video duration in seconds. Valid options: 4, 8, or 12. |
Whisper V3 Large
Whisper V3 Large is a state-of-the-art automatic speech recognition model that transcribes audio to text.| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
prompt | string | No | "" | Optional context or prompt to guide transcription. |
audio | string | Yes | - | URL of the audio file to transcribe. |
Minimax Speech 02 HD
Minimax Speech 02 HD is a high-definition text-to-speech model with emotional control and voice customization.| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
prompt | string | Yes | - | Text to convert to speech. |
voice_id | string | No | ”Wise_Woman” | Voice identifier for the desired voice. |
speed | number | No | 1 | Speech speed multiplier. |
volume | number | No | 1 | Volume level. |
pitch | number | No | 0 | Pitch adjustment. |
emotion | string | No | ”neutral” | Emotion to convey (e.g., “happy”, “sad”). |
english_normalization | boolean | No | false | Enable English text normalization. |
default_audio_url | string | No | "" | Fallback audio URL. |
Flux Kontext Dev
A 12 billion parameter model for editing images based on text instructions.| Parameter | Type | Required | Default | Range | Description |
|---|---|---|---|---|---|
prompt | string | Yes | - | - | Text instructions describing the desired edits to the image. |
negative_prompt | string | No | "" | - | Elements to exclude from the edited image. |
image | string | Yes | - | - | URL of the input image to edit. |
size | string | No | ”1024*1024” | - | Output image size in format “width*height”. |
num_inference_steps | integer | No | 28 | 1-50 | Number of denoising steps. |
guidance | float | No | 2 | 0.0-10.0 | How closely to follow the prompt. |
seed | integer | No | -1 | - | Provide a seed for reproducible results. The default value (-1) will generate a random seed. |
output_format | string | No | ”png" | "png” or “jpeg” | Output image format. |
enable_safety_checker | boolean | No | true | - | Whether to run safety checks on the output. |
WAN 2.5
WAN 2.5 generates videos from static images.| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
prompt | string | Yes | - | Text description of the desired video. |
image | string | Yes | - | URL of the source image to animate. |
negative_prompt | string | No | - | Elements to exclude from the video. |
size | string | No | ”1280*720” | Video dimensions. |
duration | integer | No | 5 | Video duration in seconds. |
seed | integer | No | -1 | Seed for reproducible results. -1 generates a random seed. |
enable_prompt_expansion | boolean | No | false | Automatically expand and enhance the prompt. |
enable_safety_checker | boolean | No | true | Enable content safety checking. |
Wan 2.2 I2V 720p LoRA
Wan 2.2 is an open-source video generation model with LoRA support for customized camera movements and effects.| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
prompt | string | Yes | - | Text description of the desired video motion. |
image | string | Yes | - | URL of the source image to animate. |
high_noise_loras | array | No | [] | LoRA configurations for high-noise stages. |
high_noise_loras[].path | string | Yes | - | URL or path to the LoRA model file. |
high_noise_loras[].scale | number | Yes | - | Scale factor for the LoRA influence. |
low_noise_loras | array | No | [] | LoRA configurations for low-noise stages. |
low_noise_loras[].path | string | Yes | - | URL or path to the LoRA model file. |
low_noise_loras[].scale | number | Yes | - | Scale factor for the LoRA influence. |
duration | integer | No | 5 | Video duration in seconds. |
seed | integer | No | -1 | Seed for reproducible results. -1 generates a random seed. |
enable_safety_checker | boolean | No | true | Enable content safety checking. |
Wan 2.2 I2V 720p
An open-source image-to-video generation model that creates 720p video content from static images.| Parameter | Type | Required | Default | Range | Description |
|---|---|---|---|---|---|
prompt | string | Yes | - | - | Text description of the desired video motion and content. |
image | string | Yes | - | - | URL of the input image to animate. |
negative_prompt | string | No | "" | - | Elements to exclude from the generated video. |
size | string | No | ”1280*720” | - | Video resolution in format “width*height”. |
num_inference_steps | integer | No | 30 | 1-50 | Number of denoising steps. |
guidance | float | No | 5 | 0.0-10.0 | How closely to follow the prompt. |
duration | integer | No | 5 | - | Video duration in seconds. |
flow_shift | integer | No | 5 | - | Controls the motion flow in the generated video. |
seed | integer | No | -1 | - | Provide a seed for reproducible results. The default value (-1) will generate a random seed. |
enable_prompt_optimization | boolean | No | false | - | Whether to automatically optimize the prompt. |
enable_safety_checker | boolean | No | true | - | Whether to run safety checks on the output. |
Wan 2.2 T2V 720p
Open-source model for generating 720p videos from text prompts.| Parameter | Type | Required | Default | Range | Description |
|---|---|---|---|---|---|
prompt | string | Yes | - | - | Text description of the desired video content. |
negative_prompt | string | No | "" | - | Elements to exclude from the generated video. |
size | string | No | ”1280*720” | - | Video resolution in format “width*height”. |
num_inference_steps | integer | No | 30 | 1-50 | Number of denoising steps. |
guidance | float | No | 5 | 0.0-10.0 | How closely to follow the prompt. |
duration | integer | No | 5 | - | Video duration in seconds. |
flow_shift | integer | No | 5 | - | Controls the motion flow in the generated video. |
seed | integer | No | -1 | - | Provide a seed for reproducible results. The default value (-1) will generate a random seed. |
enable_prompt_optimization | boolean | No | false | - | Whether to automatically optimize the prompt. |
enable_safety_checker | boolean | No | true | - | Whether to run safety checks on the output. |
Wan 2.1 I2V 720p
Open-source image-to-video generation model that converts static images into 720p videos.| Parameter | Type | Required | Default | Range | Description |
|---|---|---|---|---|---|
prompt | string | Yes | - | - | Text description of the desired video motion and content. |
image | string | Yes | - | - | URL of the input image to animate. |
negative_prompt | string | No | "" | - | Elements to exclude from the generated video. |
size | string | No | ”1280*720” | - | Video resolution in format “width*height”. |
num_inference_steps | integer | No | 30 | 1-50 | Number of denoising steps. |
guidance | float | No | 5 | 0.0-10.0 | How closely to follow the prompt. |
duration | integer | No | 5 | - | Video duration in seconds. |
flow_shift | integer | No | 5 | - | Controls the motion flow in the generated video. |
seed | integer | No | -1 | - | Provide a seed for reproducible results. The default value (-1) will generate a random seed. |
enable_prompt_optimization | boolean | No | false | - | Whether to automatically optimize the prompt. |
enable_safety_checker | boolean | No | true | - | Whether to run safety checks on the output. |
Wan 2.1 T2V 720p
An open-source video generation model for creating 720p videos from text prompts.| Parameter | Type | Required | Default | Range | Description |
|---|---|---|---|---|---|
prompt | string | Yes | - | - | Text description of the desired video content. |
negative_prompt | string | No | "" | - | Elements to exclude from the generated video. |
size | string | No | ”1280*720” | - | Video resolution in format “width*height”. |
num_inference_steps | integer | No | 30 | 1-50 | Number of denoising steps. |
guidance | float | No | 5 | 0.0-10.0 | How closely to follow the prompt. |
duration | integer | No | 5 | - | Video duration in seconds. |
flow_shift | integer | No | 5 | - | Controls the motion flow in the generated video. |
seed | integer | No | -1 | - | Provide a seed for reproducible results. The default value (-1) will generate a random seed. |
enable_prompt_optimization | boolean | No | false | - | Whether to automatically optimize the prompt. |
enable_safety_checker | boolean | No | true | - | Whether to run safety checks on the output. |