Foundry Cloud Platform

AI compute built for burst

Usage Based

Description

Foundry Cloud Platform provides scalable cloud compute resources specifically tailored for artificial intelligence workloads. It grants users access to powerful NVIDIA GPU clusters, including H100s and A100s, on demand, eliminating the need for lengthy procurement processes or long-term contracts. This flexibility allows AI engineers, researchers, and scientists to scale their infrastructure up or down based on immediate project needs, optimizing for both performance and cost.

The platform offers two primary modes of access: Spot Instances for cost-effective burst compute on dynamically priced, unreserved capacity, and Reserved Instances for guaranteed access to interconnected clusters for critical, uninterruptible workloads, available for durations as short as a few hours. Features include high-performance InfiniBand networking for distributed training, co-located storage with no ingress/egress fees, programmatic instance management via API or CLI, and robust enterprise-grade security including SOC2 Type II certification.

Key Features

Spot Instances: Access dynamically priced compute for potential cost savings (up to 70% claimed, 20x price-performance potential), suitable for flexible workloads.
Reserved Instances: Guarantee GPU cluster access for critical tasks from hours to weeks, with an option to resell unused capacity.
Scalable NVIDIA GPUs: On-demand access to H100s, A100s, A40s, and A5000s without long-term contracts.
Programmatic Management: Control instances, automate scaling, and manage workloads via API or CLI.
High-Performance Networking: Utilize 1.6 Tbps / 3200Gbps InfiniBand for optimized distributed training.
Flexible Storage Options: Access co-located block storage and file shares (NVMe SSD) with no ingress/egress fees.
Custom Execution Environment: Run custom scripts on startup and via SSH, manage preemption gracefully.
Enterprise-Ready Security: SOC2 Type II certified platform with multi-layer security and granular access controls.
Kubernetes Integration: Simplify workload orchestration and scaling (coming soon).