Modal

Serve custom AI models at scale

Freemium

Description

Modal enables developers to deploy and scale custom AI models and data processing tasks with minimal effort. The platform is engineered for speed, featuring sub-second container starts powered by a custom Rust-based stack, which allows for rapid iteration cycles comparable to local development. It simplifies infrastructure management by allowing users to define hardware and container requirements directly within their Python functions, without needing separate configuration files.

With Modal, users can achieve significant scalability, handling bursty and unpredictable workloads by scaling to thousands of GPUs and back down to zero almost instantaneously. It supports a wide array of applications, from generative AI inference and model fine-tuning to large-scale batch processing. The platform also offers flexible environments, seamless integrations with tools like Datadog and cloud storage providers, robust data storage solutions, job scheduling, web endpoint deployment, and built-in debugging tools for a comprehensive development experience.

Key Features

Sub-second container starts: Utilizes a Rust-based container stack for rapid cloud iterations, comparable to local development speed.
Zero config files: Define hardware (CPU, GPU, memory) and container image requirements directly within Python functions using decorators.
Instant Autoscaling: Automatically scales from zero to hundreds or thousands of GPUs and containers in seconds, and back down, to handle bursty workloads.
Flexible Environments: Bring your own Docker images or build them programmatically in Python, with access to various GPUs like Nvidia H100s & A100s.
Seamless Integrations: Export function logs to Datadog or OpenTelemetry-compatible providers and easily mount cloud storage from S3, R2, etc.
Data Storage Solutions: Manage persistent data with network file systems (volumes), distributed key-value stores, and message queues, all accessible via Python.
Job Scheduling: Define and manage cron jobs, retries, and timeouts for scheduled tasks, or use batching to optimize resource usage.
Web Endpoints: Deploy Python functions as secure HTTPS web services with support for custom domains, streaming, and WebSockets.
Built-In Debugging: Troubleshoot code interactively in the cloud using the `modal shell` and set breakpoints for efficient issue resolution.
Serverless GPU Access: Provision and use Nvidia GPUs (H100, A100, L40S, A10G, L4, T4) on a pay-per-second basis for inference, training, and fine-tuning.

Use Cases

Generative AI Inference (LLMs, diffusion models)
Fine-tuning and training AI models
Large-scale batch processing and data jobs
Language Model serving and APIs
Image, Video, and 3D model processing
Audio processing and generation
Secure sandboxed code execution
Computational biology and scientific computing
Building RAG (Retrieval-Augmented Generation) systems
Fast podcast transcription and analysis
Deploying web services and APIs for AI applications

Frequently Asked Questions

How does Modal's serverless pricing differ from traditional on-demand pricing?

Modal is serverless, meaning it instantly autoscales resources up and down based on request volume. You only pay for actual compute time, which can be more cost-effective for spiky or unpredictable workloads compared to fixed on-demand or reserved compute which often leads to paying for idle resources.

How are CPU and memory usage metered?

CPU usage is metered per physical core per second (minimum 0.125 cores per container). Memory usage is metered per GiB per second.

Can I use my existing cloud provider credits on Modal?

You can use committed AWS spend on Modal via the AWS Marketplace. Support for Google Cloud Marketplace is planned.

What types of GPUs are available on Modal?

Modal offers a range of Nvidia GPUs, including H100, A100 (80GB & 40GB), L40S, A10G, L4, and T4, all billed by the second of usage.

Is there a free tier or credit for new users?

Yes, the Starter plan includes $30 per month in free compute credits. Additionally, startups can apply for up to $50,000 in free compute credits.