transformers
library.
SmolLM3 is a family of small language models developed by Hugging Face that provides strong performance while being efficient enough to run on modest hardware.
The 3B parameter model we’ll use in this tutorial requires only 24 GB of VRAM, making it accessible for experimentation and development.
What you’ll learn
In this tutorial, you’ll learn how to:- Deploy a Pod with the PyTorch template.
- Access the web terminal and JupyterLab services.
- Install the transformers and accelerate libraries.
- Use SmolLM3 for text generation in a Python notebook.
- Configure model parameters for different use cases.
Requirements
Before you begin, you’ll need:- A Runpod account.
- At least $5 in Runpod credits.
- Basic familiarity with Python and Jupyter notebooks.
Step 1: Deploy a Pod with PyTorch template
First, you’ll deploy a Pod using the official Runpod PyTorch template:- Navigate to the Pods page in the Runpod console.
- Click Deploy to create a new Pod.
- In the template selection, choose latest the Runpod PyTorch template (this should be the default setting).
- For GPU selection, choose any GPU with 24 GB or more VRAM. Good options include:
- RTX 4090 (24 GB VRAM)
- RTX A5000 (24 GB VRAM)
- L40 (48 GB VRAM)
- Keep all the other settings on their defaults.
- Click Deploy On-Demand to create your Pod.
Step 2: Install required packages
Once your Pod is running, you’ll need to install thetransformers
and accelerate
Python libraries:
- In the Runpod console, find and expand your deployed Pod and click Connect.
- Under Web Terminal, click Start to start the terminal service.
- Click Open Web Terminal to open a terminal session in your browser.
- In the terminal, install the required packages by running:
Step 3: Open JupyterLab
Next we’ll prepare our JupyterLab coding environment:- Go back to the Runpod console and click Connect on your Pod again.
- Under HTTP Services, click Connect to HTTP Service [Port 8888] to open JupyterLab.
- If the JupyterLab service shows as “Not Ready”, wait a moment and refresh the page.
Step 4: Create and run your SmolLM3 notebook
In JupyterLab, create a new notebook to perform inference using the SmolLM3 model:- In JupyterLab, click File > New > Notebook.
- Select Python 3 (ipykernel) when prompted for the kernel.
- In the first cell of your notebook, enter the following code:
- Run the cell by pressing Cmd + Enter (Mac) or Ctrl + Enter (Windows) or clicking the Run button.
max_new_tokens
and run the cell again to get a longer response (it will just take longer to run).
Step 5: Understanding the code
Let’s break down the key components of the code we just ran:pipeline()
: Creates a high-level interface for text generation.model="HuggingFaceTB/SmolLM3-3B"
: Specifies the model to use.torch_dtype=torch.bfloat16
: Uses 16-bit floating point for memory efficiency.device_map=0
: Automatically places the model on the first available GPU.messages
: Defines a chat-like conversation with system and user roles.
Step 6: Experiment with different prompts and parameters
Once your model is loaded, you can experiment with different prompts and generation parameters:Try different conversation topics
Try running the following code in a new cell:Adjust generation parameters
You can modify various parameters to control the model’s output:max_new_tokens
: Controls the maximum length of the generated texttemperature
: Controls randomness (0.1 = more focused, 1.0 = more creative)top_k
: Limits the vocabulary to the top K most likely tokenstop_p
: Uses nucleus sampling to control diversity
Use single-turn prompts
You can also use SmolLM3 for simple text completion without the chat format:Troubleshooting
Here are solutions to common issues:- Out of memory errors: Ensure you’re using a GPU with at least 24 GB VRAM, or try reducing the batch size.
- Model download fails: Check your internet connection and try running the cell again.
- JupyterLab not accessible: Wait a few minutes after Pod deployment for services to fully start. If the JupyterLab tab is blank when you open it, try stopping and then restarting the Pod.
- Import errors: Make sure you installed the packages in step 2 using the web terminal.
Next steps
Now that you have SmolLM3 running, you can explore more advanced use cases:- Integration with applications: Use SmolLM3 as part of larger applications by integrating it with web frameworks or APIs.
- Model comparison: Try other models in the SmolLM3 family or compare with other small language models to find the best fit for your use case.
- Persistent storage: If you plan to work with SmolLM3 regularly, consider using a network volume to persist your models and notebooks across Pod sessions.