Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.runpod.io/llms.txt

Use this file to discover all available pages before exploring further.

Run an Ollama server on CPU for LLM inference. This tutorial focuses on CPU compute, but you can also select a GPU for faster performance.

Requirements

Before starting, you’ll need:
  • A Runpod account with credits.
  • (Optional) A network volume to store models.

Step 1: Deploy a Serverless endpoint

We recommend attaching a network volume to store downloaded models. Without a network volume, the worker downloads the model on every cold start, increasing latency. You can attach a network volume to your endpoint after it’s deployed.
  1. Log in to the Runpod console.
  2. Navigate to Serverless and select New Endpoint.
  3. Choose CPU and select a configuration (for example, 8 vCPUs and 16 GB RAM).
  4. Configure your worker settings as needed.
  5. In the Container Image field, enter: pooyaharatian/runpod-ollama:0.0.8
  6. In the Container Start Command field, enter the model name (for example, orca-mini or llama3.1). See the Ollama library for available models.
  7. Allocate at least 20 GB of container disk space.
  8. (Optional) Add an environment variable with key OLLAMA_MODELS and value /runpod-volume to store models on your attached network volume.
  9. Select Deploy.
Wait for the model to download and the worker to become ready.

Step 2: Send a request

Once your endpoint is deployed:
  1. Go to the Requests section in the Runpod console.
  2. Enter the following JSON in the input field:
    {
      "input": {
        "method_name": "generate",
        "input": {
          "prompt": "Why is the sky blue?"
        }
      }
    }
    
  3. Select Run.
You’ll receive a response like this:
{
  "delayTime": 153,
  "executionTime": 4343,
  "id": "c2cb6af5-c822-4950-bca9-5349288c001d-u1",
  "output": {
    "model": "orca-mini",
    "response": "The sky appears blue because of a process called scattering...",
    "done": true
  },
  "status": "COMPLETED"
}
Your Ollama endpoint is now ready to integrate into your applications using the Runpod API.

Next steps