> ## Documentation Index
> Fetch the complete documentation index at: https://docs.runpod.io/llms.txt
> Use this file to discover all available pages before exploring further.

# IBM Granite 4.0

> A 32B parameter long-context instruct model for text generation.

IBM Granite-4.0-H-Small is a 32B parameter long-context instruct model. It excels at general text generation, instruction following, and conversational AI tasks with support for extended context lengths.

<Card title="Try in playground" icon="play" href="https://console.runpod.io/hub/playground/text/granite-4-0-h-small" horizontal>
  Test IBM Granite 4.0 in the Runpod Hub playground.
</Card>

|              |                                                        |
| ------------ | ------------------------------------------------------ |
| **Endpoint** | `https://api.runpod.ai/v2/granite-4-0-h-small/runsync` |
| **Pricing**  | \$10.00 per 1M tokens                                  |
| **Type**     | Text generation                                        |

## Request

All parameters are passed within the `input` object in the request body.

<ParamField body="input.messages" type="array" required>
  Array of message objects with role and content.
</ParamField>

<ParamField body="input.messages[].role" type="string" required>
  The role of the message author. Use `system`, `user`, or `assistant`.
</ParamField>

<ParamField body="input.messages[].content" type="string" required>
  The content of the message.
</ParamField>

<ParamField body="input.sampling_params.max_tokens" type="integer" default="512">
  Maximum number of tokens to generate.
</ParamField>

<ParamField body="input.sampling_params.temperature" type="float" default="0.7">
  Controls randomness in generation. Lower values make output more deterministic. Range: 0.0-1.0.
</ParamField>

<ParamField body="input.sampling_params.seed" type="integer" default="-1">
  Seed for reproducible results. Set to -1 for random.
</ParamField>

<ParamField body="input.sampling_params.top_k" type="integer" default="-1">
  Restricts sampling to the top K most probable tokens.
</ParamField>

<ParamField body="input.sampling_params.top_p" type="float" default="1">
  Nucleus sampling threshold. Range: 0.0-1.0.
</ParamField>

<RequestExample>
  ```bash cURL theme={"theme":{"light":"github-light","dark":"github-dark"}}
  curl -X POST "https://api.runpod.ai/v2/granite-4-0-h-small/runsync" \
    -H "Authorization: Bearer $RUNPOD_API_KEY" \
    -H "Content-Type: application/json" \
    -d '{
      "input": {
        "messages": [
          {
            "role": "system",
            "content": "You are a helpful assistant. Please ensure responses are professional, accurate, and safe."
          },
          {
            "role": "user",
            "content": "What is Runpod?"
          }
        ],
        "sampling_params": {
          "max_tokens": 512,
          "temperature": 0.7,
          "seed": -1,
          "top_k": -1,
          "top_p": 1
        }
      }
    }'
  ```

  ```python Python theme={"theme":{"light":"github-light","dark":"github-dark"}}
  import requests

  response = requests.post(
      "https://api.runpod.ai/v2/granite-4-0-h-small/runsync",
      headers={
          "Authorization": f"Bearer {RUNPOD_API_KEY}",
          "Content-Type": "application/json",
      },
      json={
          "input": {
              "messages": [
                  {
                      "role": "system",
                      "content": "You are a helpful assistant. Please ensure responses are professional, accurate, and safe.",
                  },
                  {"role": "user", "content": "What is Runpod?"},
              ],
              "sampling_params": {
                  "max_tokens": 512,
                  "temperature": 0.7,
                  "seed": -1,
                  "top_k": -1,
                  "top_p": 1,
              },
          }
      },
  )

  result = response.json()
  print(result["output"])
  ```

  ```javascript JavaScript theme={"theme":{"light":"github-light","dark":"github-dark"}}
  const response = await fetch(
    "https://api.runpod.ai/v2/granite-4-0-h-small/runsync",
    {
      method: "POST",
      headers: {
        Authorization: `Bearer ${RUNPOD_API_KEY}`,
        "Content-Type": "application/json",
      },
      body: JSON.stringify({
        input: {
          messages: [
            {
              role: "system",
              content: "You are a helpful assistant. Please ensure responses are professional, accurate, and safe.",
            },
            { role: "user", content: "What is Runpod?" },
          ],
          sampling_params: {
            max_tokens: 512,
            temperature: 0.7,
            seed: -1,
            top_k: -1,
            top_p: 1,
          },
        },
      }),
    }
  );

  const result = await response.json();
  console.log(result.output);
  ```
</RequestExample>

## Response

<ResponseField name="id" type="string">
  Unique identifier for the request.
</ResponseField>

<ResponseField name="status" type="string">
  Request status. Returns `COMPLETED` on success, `FAILED` on error.
</ResponseField>

<ResponseField name="delayTime" type="integer">
  Time in milliseconds the request spent in queue before processing began.
</ResponseField>

<ResponseField name="executionTime" type="integer">
  Time in milliseconds the model took to generate the response.
</ResponseField>

<ResponseField name="workerId" type="string">
  Identifier of the worker that processed the request.
</ResponseField>

<ResponseField name="output" type="object">
  The generation result containing the text and usage information.

  <ResponseField name="output.choices" type="array">
    Array containing the generated text.
  </ResponseField>

  <ResponseField name="output.cost" type="float">
    Cost of the generation in USD.
  </ResponseField>

  <ResponseField name="output.usage" type="object">
    Token usage information.
  </ResponseField>
</ResponseField>

<ResponseExample>
  ```json 200 theme={"theme":{"light":"github-light","dark":"github-dark"}}
  {
    "id": "sync-a1b2c3d4-e5f6-7890-abcd-ef1234567890-u1",
    "status": "COMPLETED",
    "delayTime": 15,
    "executionTime": 2345,
    "workerId": "oqk7ao1uomckye",
    "output": {
      "choices": [
        {
          "tokens": [
            "Runpod is a cloud computing platform that provides GPU resources for AI and machine learning workloads..."
          ]
        }
      ],
      "cost": 0.00185,
      "usage": {
        "input": 35,
        "output": 150
      }
    }
  }
  ```

  ```json 400 theme={"theme":{"light":"github-light","dark":"github-dark"}}
  {
    "id": "sync-a1b2c3d4-e5f6-7890-abcd-ef1234567890-u1",
    "status": "FAILED",
    "error": "Invalid messages format"
  }
  ```
</ResponseExample>

## Cost calculation

IBM Granite 4.0 charges \$10.00 per 1M tokens. Example costs:

| Tokens           | Cost    |
| ---------------- | ------- |
| 1,000 tokens     | \$0.01  |
| 10,000 tokens    | \$0.10  |
| 100,000 tokens   | \$1.00  |
| 1,000,000 tokens | \$10.00 |
