Skip to main content
IBM Granite-4.0-H-Small is a 32B parameter long-context instruct model. It excels at general text generation, instruction following, and conversational AI tasks with support for extended context lengths.

Try in playground

Test IBM Granite 4.0 in the Runpod Hub playground.
Endpointhttps://api.runpod.ai/v2/granite-4-0-h-small/runsync
Pricing$10.00 per 1M tokens
TypeText generation

Request

All parameters are passed within the input object in the request body.
input.messages
array
required
Array of message objects with role and content.
input.messages[].role
string
required
The role of the message author. Use system, user, or assistant.
input.messages[].content
string
required
The content of the message.
input.sampling_params.max_tokens
integer
default:"512"
Maximum number of tokens to generate.
input.sampling_params.temperature
float
default:"0.7"
Controls randomness in generation. Lower values make output more deterministic. Range: 0.0-1.0.
input.sampling_params.seed
integer
default:"-1"
Seed for reproducible results. Set to -1 for random.
input.sampling_params.top_k
integer
default:"-1"
Restricts sampling to the top K most probable tokens.
input.sampling_params.top_p
float
default:"1"
Nucleus sampling threshold. Range: 0.0-1.0.
curl -X POST "https://api.runpod.ai/v2/granite-4-0-h-small/runsync" \
  -H "Authorization: Bearer $RUNPOD_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "input": {
      "messages": [
        {
          "role": "system",
          "content": "You are a helpful assistant. Please ensure responses are professional, accurate, and safe."
        },
        {
          "role": "user",
          "content": "What is Runpod?"
        }
      ],
      "sampling_params": {
        "max_tokens": 512,
        "temperature": 0.7,
        "seed": -1,
        "top_k": -1,
        "top_p": 1
      }
    }
  }'

Response

id
string
Unique identifier for the request.
status
string
Request status. Returns COMPLETED on success, FAILED on error.
delayTime
integer
Time in milliseconds the request spent in queue before processing began.
executionTime
integer
Time in milliseconds the model took to generate the response.
workerId
string
Identifier of the worker that processed the request.
output
object
The generation result containing the text and usage information.
output.choices
array
Array containing the generated text.
output.cost
float
Cost of the generation in USD.
output.usage
object
Token usage information.
{
  "id": "sync-a1b2c3d4-e5f6-7890-abcd-ef1234567890-u1",
  "status": "COMPLETED",
  "delayTime": 15,
  "executionTime": 2345,
  "workerId": "oqk7ao1uomckye",
  "output": {
    "choices": [
      {
        "tokens": [
          "Runpod is a cloud computing platform that provides GPU resources for AI and machine learning workloads..."
        ]
      }
    ],
    "cost": 0.00185,
    "usage": {
      "input": 35,
      "output": 150
    }
  }
}

Cost calculation

IBM Granite 4.0 charges $10.00 per 1M tokens. Example costs:
TokensCost
1,000 tokens$0.01
10,000 tokens$0.10
100,000 tokens$1.00
1,000,000 tokens$10.00