Setting up your Endpoint
Use a Network volume to attach to your Worker so that it can cache the LLM and decrease cold start times. If you do not use a network volume, the Worker will have to download the model every time it spins back up, leading to increased latency and resource consumption.
- Log in to your Runpod account.
- Navigate to the Serverless section and select New Endpoint.
- Choose CPU and provide a name for your Endpoint, for example 8 vCPUs 16 GB RAM.
- Configure your Worker settings according to your needs.
- In the Container Image field, enter the
pooyaharatian/runpod-ollama:0.0.8
container image. - In the Container Start Command field, specify the Ollama supported model, such as
orca-mini
orllama3.1
. - Allocate sufficient container disk space for your model. Typically, 20 GB should suffice for most models.
- (optional) In Environment Variables, set a new key to
OLLAMA_MODELS
and its value to/runpod-volume
. This will allow the model to be stored to your attached volume. - Click Deploy to initiate the setup.
Sending a Run request
After your endpoint is deployed and the model is downloaded, you can send a run request to test the setup.- Go to the Requests section in the Runpod web UI.
-
In the input module, enter the following JSON object:
- Select Run to execute the request.
-
In a few seconds, you will receive a response. For example: