This walkthrough teaches you how to persist data outside of container. Runpod has the same concept used for attaching a Network Volume to your Pod.Consult the documentation on attaching a Network Volume to your Pod.
Why persist data outside a container?
The key goal is to have data persist across multiple container runs and removals. By default, containers are ephemeral - everything inside them disappears when they exit. So running something like:file.txt
temporarily inside that container. As soon as the container shuts down, that file and data is destroyed. This isn’t great when you’re training data and want your information to persist past your LLM training.
Because of this, we need to persist data outside the container. Let’s take a look at a workflow you can use to persist data outside a container.
Create a named volume
First, we’ll create a named volume to represent the external storage:Update Dockerfile
Next, we’ll modify our Dockerfile to write the date output to a file rather than printing directly to stdout:/data
, touches a file called current_date.txt
, and copies our script.
Update entrypoint script
Theentrypoint.sh
script is updated:
/data/current_date.txt
file instead of printing it.
Mount the volume
Now when the container runs, this will write the date to the/data/current_date.txt
file instead of printing it.
Finally, we can mount the named volume to this data directory:
date-volume
Docker volume to the /data directory in the container. Anything written to /data
inside the container will now be written to the date-volume
on the host instead of the container’s ephemeral file system. This allows the data to persist. Once the container exits, the date output file is safely stored on the host volume.
After the container exits, we can exec into another container sharing the volume to see the persisted data file:
date-volume
.
- Using the same -
v date-volume:/data mount
point maps the external volume dir to/data
again. - This allows the new container to access the persistent date file that the first container wrote.
- The
cat /data/current_date.txt
command prints out the file with the date output from the first container. - The
--rm
flag removes the container after running so we don’t accumulate stopped containers.
Remember, this is a general tutorial on Docker. These concepts will help give you a better understanding of working with Runpod.