IBM Granite 4.0 - Runpod Documentation

IBM Granite-4.0-H-Small is a 32B parameter long-context instruct model. It excels at general text generation, instruction following, and conversational AI tasks with support for extended context lengths.

Try in playground

Test IBM Granite 4.0 in the Runpod Hub playground.


Endpoint	`https://api.runpod.ai/v2/granite-4-0-h-small/runsync`
Pricing	$10.00 per 1M tokens
Type	Text generation

Request

All parameters are passed within the input object in the request body.

input.messages

array

required

Array of message objects with role and content.

input.messages[].role

string

required

The role of the message author. Use system, user, or assistant.

input.messages[].content

string

required

The content of the message.

input.sampling_params.max_tokens

integer

default:"512"

Maximum number of tokens to generate.

input.sampling_params.temperature

float

default:"0.7"

Controls randomness in generation. Lower values make output more deterministic. Range: 0.0-1.0.

input.sampling_params.seed

integer

default:"-1"

Seed for reproducible results. Set to -1 for random.

input.sampling_params.top_k

integer

default:"-1"

Restricts sampling to the top K most probable tokens.

input.sampling_params.top_p

float

default:"1"

Nucleus sampling threshold. Range: 0.0-1.0.

curl -X POST "https://api.runpod.ai/v2/granite-4-0-h-small/runsync" \
  -H "Authorization: Bearer $RUNPOD_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "input": {
      "messages": [
        {
          "role": "system",
          "content": "You are a helpful assistant. Please ensure responses are professional, accurate, and safe."
        },
        {
          "role": "user",
          "content": "What is Runpod?"
        }
      ],
      "sampling_params": {
        "max_tokens": 512,
        "temperature": 0.7,
        "seed": -1,
        "top_k": -1,
        "top_p": 1
      }
    }
  }'

Response

string

Unique identifier for the request.

status

string

Request status. Returns COMPLETED on success, FAILED on error.

delayTime

integer

Time in milliseconds the request spent in queue before processing began.

executionTime

integer

Time in milliseconds the model took to generate the response.

workerId

string

Identifier of the worker that processed the request.

output

object

The generation result containing the text and usage information.

output.choices

array

Array containing the generated text.

output.cost

float

Cost of the generation in USD.

output.usage

object

Token usage information.

{
  "id": "sync-a1b2c3d4-e5f6-7890-abcd-ef1234567890-u1",
  "status": "COMPLETED",
  "delayTime": 15,
  "executionTime": 2345,
  "workerId": "oqk7ao1uomckye",
  "output": {
    "choices": [
      {
        "tokens": [
          "Runpod is a cloud computing platform that provides GPU resources for AI and machine learning workloads..."
        ]
      }
    ],
    "cost": 0.00185,
    "usage": {
      "input": 35,
      "output": 150
    }
  }
}

Cost calculation

IBM Granite 4.0 charges $10.00 per 1M tokens. Example costs:

Tokens	Cost
1,000 tokens	$0.01
10,000 tokens	$0.10
100,000 tokens	$1.00
1,000,000 tokens	$10.00

Try in playground

​Request

​Response

​Cost calculation

Request

Response

Cost calculation