curl -X POST "https://api.runpod.ai/v2/qwen3-32b-awq/runsync" \ -H "Authorization: Bearer $RUNPOD_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "input": { "prompt": "Write a Python function that checks if a number is prime:", "max_tokens": 512, "temperature": 0.7 } }'
Copy
{ "delayTime": 25, "executionTime": 3153, "id": "sync-0f3288b5-58e8-46fd-ba73-53945f5e8982-u2", "output": [ { "choices": [ { "tokens": [ "def is_prime(n):\n if n <= 1:\n return False\n for i in range(2, int(n**0.5) + 1):\n if n % i == 0:\n return False\n return True" ] } ], "cost": 0.0001, "usage": { "input": 10, "output": 100 } } ], "status": "COMPLETED", "workerId": "pkej0t9bbyjrgy"}
Text models
Qwen3 32B AWQ
Latest generation LLM with advanced reasoning, instruction-following, and multilingual support.
Qwen3 32B AWQ is the latest large language model in the Qwen series, offering advancements in reasoning, instruction-following, agent capabilities, and multilingual support. It uses AWQ quantization for efficient inference while maintaining high quality.
Stops generation if the given string is encountered.
Copy
curl -X POST "https://api.runpod.ai/v2/qwen3-32b-awq/runsync" \ -H "Authorization: Bearer $RUNPOD_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "input": { "prompt": "Write a Python function that checks if a number is prime:", "max_tokens": 512, "temperature": 0.7 } }'
Qwen3 32B AWQ is fully compatible with the OpenAI API format. You can use the OpenAI Python client to interact with this endpoint.
Python (OpenAI SDK)
Copy
from openai import OpenAIclient = OpenAI( api_key=RUNPOD_API_KEY, base_url="https://api.runpod.ai/v2/qwen3-32b-awq/openai/v1",)response = client.chat.completions.create( model="Qwen/Qwen3-32B-AWQ", messages=[ { "role": "system", "content": "You are a helpful coding assistant.", }, { "role": "user", "content": "Write a Python function that checks if a number is prime.", }, ], max_tokens=525,)print(response.choices[0].message.content)
For streaming responses, add stream=True:
Python (Streaming)
Copy
response = client.chat.completions.create( model="Qwen/Qwen3-32B-AWQ", messages=[ {"role": "user", "content": "Explain quantum computing in simple terms."} ], max_tokens=525, stream=True,)for chunk in response: if chunk.choices[0].delta.content: print(chunk.choices[0].delta.content, end="")