What you’ll learn
- Deploy an Ollama container as a Serverless .
- Configure a to cache models and reduce times.
- Send inference requests to your Ollama endpoint.
Requirements
Before starting, you’ll need:- A Runpod account with credits.
- (Optional) A network volume to store models.
Step 1: Deploy a Serverless endpoint
- Log in to the Runpod console.
- Navigate to Serverless and select New Endpoint.
- Choose CPU and select a configuration (for example, 8 vCPUs and 16 GB RAM).
- Configure your worker settings as needed.
- In the Container Image field, enter:
pooyaharatian/runpod-ollama:0.0.8 - In the Container Start Command field, enter the model name (for example,
orca-miniorllama3.1). See the Ollama library for available models. - Allocate at least 20 GB of container disk space.
- (Optional) Add an environment variable with key
OLLAMA_MODELSand value/runpod-volumeto store models on your attached network volume. - Select Deploy.
Step 2: Send a request
Once your endpoint is deployed:- Go to the Requests section in the Runpod console.
-
Enter the following JSON in the input field:
- Select Run.
Next steps
- Explore the Runpod Ollama repository for more configuration options.
- View the Runpod Ollama container image on Docker Hub.
- Learn more about sending requests to Serverless endpoints.