September 2025
Slurm Clusters GA, cached models in beta, and new Public Endpoints available
- Slurm Clusters are now generally available: Deploy production-ready HPC clusters in seconds. These clusters support multi-node performance for distributed training and large-scale simulations with pay-as-you-go billing and no idle costs.
- Cached models are now in beta: Eliminate model download times when starting workers. The system places cached models on host machines before workers start, prioritizing hosts with your model already available for instant startup.
- New Public Endpoints available: Wan 2.5 combines image and audio to create lifelike videos, while Nano Banana merges multiple images for composite creations.
August 2025
Hub revenue sharing launches and Pods UI gets refreshed
- Hub revenue share model: Publish to the Runpod Hub and earn credits when others deploy your repo. Earn up to 7% of compute revenue through monthly tiers with credits auto-deposited into your account.
- Pods UI updated: Refreshed modern interface for interacting with Runpod Pods.
July 2025
Public Endpoints arrive, Slurm Clusters in beta
- Public Endpoints: Access state-of-the-art AI models through simple API calls with an integrated playground. Available endpoints include Whisper-V3-Large, Seedance 1.0 pro, Seedream 3.0, Qwen Image Edit, FLUX.1 Kontext, Deep Cogito v2 Llama 70B, and Minimax Speech.
- Slurm Clusters (beta): Create on-demand multi-node clusters instantly with full Slurm scheduling support.
June 2025
S3-compatible storage and updated referral program
- S3-compatible API for network volumes: Upload and retrieve files from your network volumes without compute using AWS S3 CLI or Boto3. Integrate Runpod storage into any AI pipeline with zero-config ease and object-level control.
- Referral program revamp: Updated rewards and tiers with clearer dashboards to track performance.
May 2025
Port labeling, price drops, Runpod Hub, and Tetra beta test
- Port labeling: Name exposed ports in the UI and API to help team members identify services like Jupyter or TensorBoard.
- Price drops: Additional price reductions on popular GPU SKUs to lower training and inference costs.
- Runpod Hub: A curated catalog of one-click endpoints and templates for deploying community projects without starting from scratch.
- Tetra beta test: A Python library for running code on GPU with Runpod. Add a
@remote()decorator to functions that need GPU power while the rest of your code runs locally.
April 2025
GitHub login, RTX 5090s, and global networking expansion
- Login with GitHub: OAuth sign-in and linking for faster onboarding and repo-driven workflows.
- RTX 5090s on Runpod: High-performance RTX 5090 availability for cost-efficient training and inference.
- Global networking expansion: Rollout to additional data centers approaching full global coverage.
March 2025
Enterprise features arrive, REST API goes GA, Instant Clusters in beta, and APAC expansion
- CPU Pods get network storage access: GA support for network volumes on CPU Pods for persistent, shareable storage.
- SOC 2 Type I certification: Independent attestation of security controls for enterprise readiness.
- REST API release: REST API GA with broad resource coverage for full infrastructure-as-code workflows.
- Instant Clusters: Spin up multi-node GPU clusters in minutes with private interconnect and per-second billing.
- Bare metal: Reserve dedicated GPU servers for maximum control, performance, and long-term savings.
- AP-JP-1: New Fukushima region for low-latency APAC access and in-country data residency.
February 2025
REST API enters beta with full-time community manager
- REST API beta test: RESTful endpoints for Pods, endpoints, and volumes for simpler automation than GraphQL.
- Full-time community manager hire: Dedicated programs, content, and faster community response.
- Serverless GitHub integration release: GA for GitHub-based Serverless deploys with production-ready stability.
January 2025
New silicon and LLM-focused Serverless upgrades
- CPU Pods v2: Docker runtime parity with GPU Pods for faster starts with network volume support.
- H200s on Runpod: NVIDIA H200 GPUs available for larger models and higher memory bandwidth.
- Serverless upgrades: Higher GPU counts per worker, new quick-deploy runtimes, and simpler model selection.
November 2024
Global networking expands and GitHub deploys enter beta
- Global networking expansion: Added to CA-MTL-3, US-GA-1, US-GA-2, and US-KS-2 for expanded private mesh coverage.
- Serverless GitHub integration beta test: Deploy endpoints directly from GitHub repos with automatic builds.
- Scoped API keys: Least-privilege tokens with fine-grained scopes and expirations for safer automation.
- Passkey auth: Passwordless WebAuthn sign-in for phishing-resistant account access.
August 2024
Storage expansion and private cross-data-center connectivity
- US-GA-2 added to network storage: Enable network volumes in US-GA-2.
- Global networking: Private cross-data-center networking with internal DNS for secure service-to-service traffic.
July 2024
Storage coverage grows with major price cuts and revamped referrals
- US-TX-3 and EUR-IS-1 added to network storage: Network volumes available in more regions for local persistence.
- Runpod slashes GPU prices: Broad GPU price reductions to lower training and inference total cost of ownership.
- Referral program revamp: Updated commissions and bonuses with an affiliate tier and improved tracking.
May 2024
$20M seed round, community event, and broader Serverless options
- $20M seed by Intel Capital and Dell Technologies Capital: Funds infrastructure expansion and product acceleration.
- First in-person hackathon: Community projects, workshops, and real-world feedback.
- Serverless CPU Pods: Scale-to-zero CPU endpoints for services that don’t need a GPU.
- AMD GPUs: AMD ROCm-compatible GPU SKUs as cost and performance alternatives to NVIDIA.
February 2024
January 2024
Console navigation overhaul and documentation refresh
- New navigational changes to Runpod UI: Consolidated menus, consistent action placement, and fewer clicks for common tasks.
- Docs revamp: New information architecture, improved search, and more runnable examples and quickstarts.
- Zhen AMA: Roadmap Q&A and community feedback session.
December 2023
New regions and investment in community support
- US-OR-1: Additional US region for lower latency and more capacity in the Pacific Northwest.
- CA-MTL-1: New Canadian region to improve latency and meet in-country data needs.
- First community manager hire: Dedicated community programs and faster feedback loops.
- Building out the support team: Expanded coverage and expertise for complex issues.
October 2023
Faster template starts and better multi-region hygiene
- Serverless quick deploy: One-click deploy of curated model templates with sensible defaults.
- EU domain for Serverless: EU-specific domain briefly offered for data residency, superseded by other region controls.
- Data-center filter for Serverless: Filter and manage endpoints by region for multi-region fleets.
September 2023
Self-service upgrades, clearer metrics, new pricing model, and cost visibility
- Self-service worker upgrade: Rebuild and roll workers from the dashboard without support tickets.
- Edit template from endpoint page: Inline edit and redeploy the underlying template directly from the endpoint view.
- Improved Serverless metrics page: Refinements to charts and filters for quicker root-cause analysis.
- Flex and active workers: Discounted always-on “active” capacity for baseline load with on-demand “flex” workers for bursts.
- Billing explorer: Inspect costs by resource, region, and time to identify optimization opportunities.
August 2023
Team governance, storage expansion, and better debugging
- Teams: Organization workspaces with role-based access control for Pods, endpoints, and billing.
- Savings plans: Plans surfaced prominently in console with easier purchase and management for steady usage.
- Network storage to US-KS-1: Enable network volumes in US-KS-1 for local, persistent data workflows.
- Serverless log view: Stream worker stdout and stderr in the UI and API for real-time debugging.
- Serverless health endpoint: Lightweight /health probe returning endpoint and worker status without creating a billable job.
- SOC 2 Type II compliant: Security and compliance certification for enterprise customers.
June 2023
Observability, top-tier GPUs, and commitment-based savings
- Serverless metrics page: Time-series charts for pXX latencies, queue delay, throughput, and worker states for faster debugging and tuning.
- H100s on Runpod: NVIDIA H100 instances for higher throughput and larger model footprints.
- Savings plans: Commitment-based discounts for predictable workloads to lower effective hourly rates.
May 2023
Smoother auth and multi-region Serverless with persistent storage
- The new and improved Runpod login experience: Streamlined sign-in and team access for faster, more consistent auth flows.
- Network volumes added to Serverless: Attach persistent storage to Serverless workers to retain models and artifacts across restarts and speed cold starts through caching.
- Serverless region support: Pin or allow specific regions for endpoints to reduce latency and meet data-residency needs.
April 2023
Deeper autoscaling controls, richer metrics, persistent storage, and job cancellation
- Serverless scaling strategies: Scale by queue delay and/or concurrency with min/max worker bounds to balance latency and cost.
- Queue delay: Expose time-in-queue as a first-class metric to drive autoscaling and SLO monitoring.
- Request count: Track success and failure totals over windows for quick health checks and alerting.
- runsync: Synchronous invocation path that returns results in the same HTTP call for short-running jobs.
- Network storage beta: Region-scoped, attachable volumes shareable across Pods and endpoints for model caches and datasets.
- Job cancel API: Programmatically terminate queued or running jobs to free capacity and enforce client timeouts.
April 1, 2023
Serverless platform hardens with cleaner API
- Serverless API v2: Revised request and response schema with improved error semantics and new endpoints for better control over job lifecycle and observability.
February 1, 2023
Better control over notifications and GPU allocation
- Notification preferences: Configure which platform events trigger alerts to reduce noise for teams and CI systems.
- GPU priorities: Influence scheduling by marking workloads as higher priority to reduce queue time for critical jobs.