Prerequisites
The tutorial assumes you have a Runpod account with credits. No other prior knowledge is needed to complete this tutorial.Step 1: Start a PyTorch Template on Runpod
You will create a new Pod with the PyTorch template. In this step, you will set overrides to configure Ollama.- Log in to your Runpod account and choose + GPU Pod.
-
Choose a GPU Pod like
A40
. - From the available templates, select the lastet PyTorch template.
-
Select Customize Deployment.
-
Add the port
11434
to the list of exposed ports. This port is used by Ollama for HTTP API requests. -
Add the following environment variable to your Pod to allow Ollama to bind to the HTTP port:
- Key:
OLLAMA_HOST
- Value:
0.0.0.0
- Key:
-
Add the port
- Select Set Overrides, Continue, then Deploy.
Step 2: Install Ollama
Now that your Pod is running, you can Log in to the web terminal. The web terminal is a powerful way to interact with your Pod.- Select Connect and choose Start Web Terminal.
- Make note of the Username and Password, then select Connect to Web Terminal.
- Enter your username and password.
- To ensure Ollama can automatically detect and utilize your GPU, run the following commands.
- Run the following command to install Ollama and send to the background:
ollama serve
part starts the Ollama server, making it ready to serve AI models.
Now that your Ollama server is running on your Pod, add a model.
Step 3: Run an AI Model with Ollama
To run an AI model using Ollama, pass the model name to theollama run
command:
[model name]
with the name of the AI model you wish to deploy. For a complete list of models, see the Ollama Library.
This command pulls the model and runs it, making it accessible for inference. You can begin interacting with the model directly from your web terminal.
Optionally, you can set up an HTTP API request to interact with Ollama. This is covered in the next step.
Step 4: Interact with Ollama via HTTP API
With Ollama set up and running, you can now interact with it using HTTP API requests. In step 1.4, you configured Ollama to listen on all network interfaces. This means you can use your Pod as a server to receive requests. Get a list of models To list the local models available in Ollama, you can use the following GET request:[your-pod-id]
with your actual Pod Id.[your-pod-id]
with your actual Pod Id.
Because port 11434
is exposed, you can make requests to your Pod using the curl
command.
For more information on constructing HTTP requests and other operations you can perform with the Ollama API, consult the Ollama API documentation.
Additional considerations
This tutorial provides a foundational understanding of setting up and using Ollama on a GPU Pod with Runpod.- Port Configuration and documentation: For further details on exposing ports and the link structure, refer to the Runpod documentation.
- Connect VSCode to Runpod: For information on connecting VSCode to Runpod, refer to the How to Connect VSCode To Runpod.