> ## Documentation Index > Fetch the complete documentation index at: https://docs.runpod.io/llms.txt > Use this file to discover all available pages before exploring further. # Run Ollama on Serverless (CPU) > Learn how to run an Ollama server on Serverless CPU workers. export const WorkersTooltip = () => { return workers; }; export const ServerlessTooltip = () => { return Serverless; }; Run an Ollama server on CPU for LLM inference. This tutorial focuses on CPU compute, but you can also select a GPU for faster performance. ## Requirements Before starting, you'll need: * A Runpod account with credits. * (Optional) A [network volume](/storage/network-volumes) to store models. ## Step 1: Deploy a Serverless endpoint We recommend attaching a [network volume](/storage/network-volumes) to store downloaded models. Without a network volume, the worker downloads the model on every cold start, increasing latency. You can attach a network volume to your endpoint after it's deployed. 1. Log in to the [Runpod console](https://www.console.runpod.io/console/home). 2. Navigate to **Serverless** and select **New Endpoint**. 3. Choose **CPU** and select a configuration (for example, 8 vCPUs and 16 GB RAM). 4. Configure your worker settings as needed. 5. In the **Container Image** field, enter: `pooyaharatian/runpod-ollama:0.0.8` 6. In the **Container Start Command** field, enter the model name (for example, `orca-mini` or `llama3.1`). See the [Ollama library](https://ollama.com/library) for available models. 7. Allocate at least 20 GB of container disk space. 8. (Optional) Add an environment variable with key `OLLAMA_MODELS` and value `/runpod-volume` to store models on your attached network volume. 9. Select **Deploy**. Wait for the model to download and the worker to become ready. ## Step 2: Send a request Once your endpoint is deployed: 1. Go to the **Requests** section in the Runpod console. 2. Enter the following JSON in the input field: ```json theme={"theme":{"light":"github-light","dark":"github-dark"}} { "input": { "method_name": "generate", "input": { "prompt": "Why is the sky blue?" } } } ``` 3. Select **Run**. You'll receive a response like this: ```json theme={"theme":{"light":"github-light","dark":"github-dark"}} { "delayTime": 153, "executionTime": 4343, "id": "c2cb6af5-c822-4950-bca9-5349288c001d-u1", "output": { "model": "orca-mini", "response": "The sky appears blue because of a process called scattering...", "done": true }, "status": "COMPLETED" } ``` Your Ollama endpoint is now ready to integrate into your applications using the Runpod API. ## Next steps * Explore the [Runpod Ollama repository](https://github.com/pooyahrtn/) for more configuration options. * View the [Runpod Ollama container image](https://hub.docker.com/r/pooyaharatian/runpod-ollama) on Docker Hub. * Learn more about [sending requests to Serverless endpoints](/serverless/endpoints/send-requests).