> ## Documentation Index
> Fetch the complete documentation index at: https://docs.runpod.io/llms.txt
> Use this file to discover all available pages before exploring further.

# Run Ollama on Serverless (CPU)

> Learn how to run an Ollama server on Serverless CPU workers.

export const WorkersTooltip = () => {
  return <Tooltip headline="Worker" tip="A container that runs your application code and processes requests to your Serverless endpoint. Workers are automatically started and stopped by Runpod to handle traffic spikes and ensure optimal resource utilization." cta="Learn more about workers" href="/serverless/workers/overview">workers</Tooltip>;
};

export const ServerlessTooltip = () => {
  return <Tooltip headline="Serverless" tip="A cloud computing platform that allows you to deploy AI/ML applications without provisioning or managing servers." cta="Learn more about Serverless" href="/serverless/overview">Serverless</Tooltip>;
};

Run an Ollama server on <ServerlessTooltip /> CPU <WorkersTooltip /> for LLM inference. This tutorial focuses on CPU compute, but you can also select a GPU for faster performance.

## Requirements

Before starting, you'll need:

* A Runpod account with credits.
* (Optional) A [network volume](/storage/network-volumes) to store models.

## Step 1: Deploy a Serverless endpoint

<Tip>
  We recommend attaching a [network volume](/storage/network-volumes) to store downloaded models. Without a network volume, the worker downloads the model on every cold start, increasing latency. You can attach a network volume to your endpoint after it's deployed.
</Tip>

1. Log in to the [Runpod console](https://www.console.runpod.io/console/home).
2. Navigate to **Serverless** and select **New Endpoint**.
3. Choose **CPU** and select a configuration (for example, 8 vCPUs and 16 GB RAM).
4. Configure your worker settings as needed.
5. In the **Container Image** field, enter: `pooyaharatian/runpod-ollama:0.0.8`
6. In the **Container Start Command** field, enter the model name (for example, `orca-mini` or `llama3.1`). See the [Ollama library](https://ollama.com/library) for available models.
7. Allocate at least 20 GB of container disk space.
8. (Optional) Add an environment variable with key `OLLAMA_MODELS` and value `/runpod-volume` to store models on your attached network volume.
9. Select **Deploy**.

Wait for the model to download and the worker to become ready.

## Step 2: Send a request

Once your endpoint is deployed:

1. Go to the **Requests** section in the Runpod console.

2. Enter the following JSON in the input field:

   ```json theme={"theme":{"light":"github-light","dark":"github-dark"}}
   {
     "input": {
       "method_name": "generate",
       "input": {
         "prompt": "Why is the sky blue?"
       }
     }
   }
   ```

3. Select **Run**.

You'll receive a response like this:

```json theme={"theme":{"light":"github-light","dark":"github-dark"}}
{
  "delayTime": 153,
  "executionTime": 4343,
  "id": "c2cb6af5-c822-4950-bca9-5349288c001d-u1",
  "output": {
    "model": "orca-mini",
    "response": "The sky appears blue because of a process called scattering...",
    "done": true
  },
  "status": "COMPLETED"
}
```

Your Ollama endpoint is now ready to integrate into your applications using the Runpod API.

## Next steps

* Explore the [Runpod Ollama repository](https://github.com/pooyahrtn/) for more configuration options.
* View the [Runpod Ollama container image](https://hub.docker.com/r/pooyaharatian/runpod-ollama) on Docker Hub.
* Learn more about [sending requests to Serverless endpoints](/serverless/endpoints/send-requests).
