curl -X POST "https://api.runpod.ai/v2/gpt-oss-120b/runsync" \ -H "Authorization: Bearer $RUNPOD_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "input": { "prompt": "Explain the concept of quantum entanglement in simple terms:", "max_tokens": 512, "temperature": 0.7 } }'
Copy
{ "delayTime": 30, "executionTime": 4521, "id": "sync-a1b2c3d4-e5f6-7890-abcd-ef1234567890-u1", "output": [ { "choices": [ { "tokens": [ "Quantum entanglement is a phenomenon where two particles become connected in such a way that measuring one particle instantly affects the other, no matter how far apart they are..." ] } ], "cost": 0.005, "usage": { "input": 15, "output": 485 } } ], "status": "COMPLETED"}
GPT-OSS 120B
OpenAI’s open-weight 120B parameter language model for advanced text generation.
GPT-OSS 120B is OpenAI’s open-weight 120B parameter language model, offering powerful text generation capabilities with advanced reasoning and instruction-following abilities.
Token usage information with input and output counts.
Copy
{ "delayTime": 30, "executionTime": 4521, "id": "sync-a1b2c3d4-e5f6-7890-abcd-ef1234567890-u1", "output": [ { "choices": [ { "tokens": [ "Quantum entanglement is a phenomenon where two particles become connected in such a way that measuring one particle instantly affects the other, no matter how far apart they are..." ] } ], "cost": 0.005, "usage": { "input": 15, "output": 485 } } ], "status": "COMPLETED"}
GPT-OSS 120B is fully compatible with the OpenAI API format. You can use the OpenAI Python client to interact with this endpoint.
Python (OpenAI SDK)
Copy
from openai import OpenAIclient = OpenAI( api_key=RUNPOD_API_KEY, base_url="https://api.runpod.ai/v2/gpt-oss-120b/openai/v1",)response = client.chat.completions.create( model="gpt-oss-120b", messages=[ { "role": "system", "content": "You are a helpful assistant.", }, { "role": "user", "content": "Explain the concept of quantum entanglement in simple terms.", }, ], max_tokens=512,)print(response.choices[0].message.content)
For streaming responses, add stream=True:
Python (Streaming)
Copy
response = client.chat.completions.create( model="gpt-oss-120b", messages=[ {"role": "user", "content": "Write a short story about space exploration."} ], max_tokens=512, stream=True,)for chunk in response: if chunk.choices[0].delta.content: print(chunk.choices[0].delta.content, end="")