Image models
Generate and edit images with text prompts or reference images.| Model | Description | Price |
|---|---|---|
| Flux Dev | High-quality image generation with exceptional prompt adherence. | $0.02/megapixel |
| Flux Schnell | Fast, lightweight generation for prototyping. | $0.0024/megapixel |
| Flux Kontext Dev | Edit images based on text instructions. | $0.025/image |
| P-Image T2I | Ultra-fast text-to-image with automatic prompt enhancement. | $0.005/image |
| P-Image Edit | Premium image editing with complex compositions. | $0.01/image |
| Qwen Image | Image generation with advanced text rendering. | $0.02/image |
| Qwen Image LoRA | Image generation with LoRA customization. | $0.025/image |
| Qwen Image Edit | Image editing with text rendering capabilities. | $0.02/image |
| Qwen Image Edit 2511 | Enhanced image editing with improved consistency. | $0.02/image |
| Qwen Image Edit 2511 LoRA | Advanced editing with LoRA support. | $0.025/image |
| Seedream 4.0 T2I | New-generation text-to-image creation. | $0.027/image |
| Seedream 4.0 Edit | New-generation image editing. | $0.027/image |
| Seedream 3.0 | Bilingual image generation (Chinese-English). | $0.03/image |
| WAN 2.6 T2I | Open-source text-to-image at 1024x1024. | $0.03/image |
| Z-Image Turbo | Fast 6B parameter image generation. | $0.005/image |
| Nano Banana Edit | Google’s model for combining multiple images. | $0.038/image |
| Nano Banana Pro Edit | Advanced multi-image editing with resolution options. | $0.14–$0.24/image |
Video models
Create videos from images or text prompts. Pricing varies by resolution and duration.| Model | Description | Price |
|---|---|---|
| InfiniteTalk | Audio-driven talking/singing video generation. | $0.25 (480p), $0.50 (720p) |
| Kling v2.1 I2V Pro | Professional image-to-video with enhanced fidelity. | $0.45/5s, $0.90/10s |
| Kling v2.6 Motion Control | Motion transfer from reference videos. | $0.21/3s, $0.63/10s |
| Kling Video O1 R2V | Creative video with multi-reference images. | $0.112/second |
| Seedance 1.0 Pro | High-performance video with multi-shot storytelling. | From $0.12/5s |
| Seedance 1.5 Pro I2V | Cinematic image-to-video with expressive motion. | $0.024–$0.052/second |
| SORA 2 I2V | OpenAI’s video and audio generation. | $0.40 (4s), $0.80 (8s), $1.20 (12s) |
| SORA 2 Pro I2V | Professional-grade SORA video generation. | From $1.20 (720p/4s) |
| WAN 2.6 T2V | Text-to-video with resolution options. | $0.50/5s (480p), $2.25/10s (720p) |
| WAN 2.5 | Image-to-video with prompt expansion. | From $0.25/5s |
| WAN 2.2 I2V LoRA | Image-to-video with LoRA camera controls. | $0.35/5s, $0.56/8s |
| WAN 2.2 I2V | Open-source image-to-video at 720p. | $0.30/5s |
| WAN 2.2 T2V | Open-source text-to-video at 720p. | $0.30/5s |
| WAN 2.1 I2V | Image-to-video at 720p. | $0.30/5s |
| WAN 2.1 T2V | Text-to-video at 720p. | $0.30/5s |
Text models
Generate text with large language models.| Model | Description | Price |
|---|---|---|
| Cogito 671B v2.1 | 671B MoE model with FP8 dynamic quantization. | $0.50/1M tokens |
| GPT-OSS 120B | OpenAI’s open-weight 120B parameter model. | $10.00/1M tokens |
| IBM Granite 4.0 | 32B parameter long-context instruct model. | $10.00/1M tokens |
| Qwen3 32B AWQ | Advanced reasoning and multilingual support. OpenAI-compatible. | $10.00/1M tokens |
Audio models
Transcribe speech or generate audio from text.| Model | Description | Price |
|---|---|---|
| Chatterbox Turbo | Fast TTS with 20 preset voices and voice cloning. | $0.001/second |
| Whisper V3 Large | State-of-the-art speech recognition. | $0.05/1K chars |
| Minimax Speech 02 HD | Text-to-speech with emotional control. | $0.05/1K chars |
Next steps
- Quickstart: Get started with your first API request.
- Make API requests: Learn about request/response formats.
- Vercel AI SDK: Use the TypeScript SDK for easier integration.
- Build a text-to-video pipeline: Chain multiple endpoints in a Python application.