Skip to main content
Learn what to expect during planned and unplanned maintenance events, and how to keep your data safe.
Runpod operates on shared infrastructure. Like any cloud platform, maintenance and unexpected outages can occur. This page explains how these situations are handled and what you can do to protect your work.

Planned maintenance

When scheduled maintenance is required on a machine hosting your pod, Runpod notifies you in advance. Notifications are sent via email before the maintenance window begins so you have time to save your work, back up data, or migrate to another pod. During a maintenance window, Runpod does not charge you for the time your pod is unavailable. If you cannot wait for maintenance to complete, you can deploy another resource in the meantime. If you have questions about a maintenance window or believe your pod was impacted, contact Runpod Support.

Unplanned outages

Hardware failures and sudden crashes can happen without warning. In these cases:
  • Runpod may only be able to notify you after the outage has begun, not before.
  • You will be notified as soon as the issue is identified.
If you believe your workload was impacted by an unplanned outage, contact Runpod Support with your pod ID to understand the impact, timeline, and current status of the incident.

Data safety

Pods use temporary container storage by default. If your pod is interrupted, restarted, stopped, or terminated, any data that is only stored on container storage will be lost. To protect your work, always store important data on a network volume or an external backup.

Use a network volume

Attach a network volume to your pod to persist data across restarts and pod deletions. This is the most reliable way to ensure your data survives unexpected outages.

Set up checkpointing

For long-running jobs, implement checkpointing to save progress periodically (every hour to every few hours depending on job length). This limits the amount of work lost if a pod restarts unexpectedly. Most machine learning frameworks include built-in checkpointing support. See your framework’s documentation to get started:

Maintain backups

The industry standard for data protection is the 3-2-1 rule:
  • 3 copies of your data
  • 2 different storage types (for example, a network volume and an external object store)
  • 1 copy stored offsite or in a separate location
Use runpodctl or cloud syncs to automate backups. Runpod cannot guarantee recovery of data stored only on ephemeral container disk.

Network volumes

Set up persistent, portable storage that survives pod restarts and deletions.

Storage options

Compare container disk, volume disk, and network volume storage types.