Carton

Run any ML model from any programming language with one open-source API.

Free

Description

Carton is an open-source solution designed to simplify machine learning model deployment and inference by providing a unified API for all major frameworks. It enables users to package models with metadata into a zip file, allowing them to run models from any programming language without modifying the original files or performing complex conversions.

Built on an optimized Rust core, Carton manages framework-specific runners behind the scenes and supports multi-platform deployment, including support for Linux, macOS, and Apple Silicon architectures. This approach streamlines experimentation, iteration, and integration of ML models into various applications.

Key Features

Unified API: Access any ML model across all major frameworks and languages with a single API.
Model Wrapping: Packages models and metadata into a zip file without modifying the original model.
Framework-Agnostic Inference: Decouples inference code from frameworks, enhancing flexibility and maintainability.
Automatic Runner Management: Detects and fetches the correct model runner based on framework and version.
Optimized Rust Core: Ensures minimal overhead with highly efficient async Rust implementation.
Multi-language Support: Bindings available for Python, JavaScript, TypeScript, Rust, C, C++, C#, Java, Golang, Swift, Ruby, PHP, Kotlin, and Scala.
Multi-platform Support: Runs on x86_64 and aarch64 Linux/macOS, Apple Silicon, and partially on WebAssembly.
No Model Conversion Needed: Avoids error-prone conversion steps for models.

Use Cases

Machine learning model deployment
ML model inference from multiple languages
Rapid experimentation and prototyping
Running ML models on different hardware platforms
Integrating AI into polyglot applications
Bypassing model conversion challenges

Frequently Asked Questions

Why not use Torch, TF, etc. directly?

Ideally, the ML framework used to run a model should just be an implementation detail. By decoupling your inference code from specific frameworks, you can easily keep up with the cutting-edge.

How much overhead does Carton have?

Most of Carton is implemented in optimized async Rust code. Preliminary benchmarks with small inputs show an overhead of less than 100 microseconds (0.0001 seconds) per inference call. We're still optimizing things further with better use of Shared Memory. This should bring models with large inputs to similar levels of overhead.

What platforms does Carton support?

Currently, Carton supports the following platforms: x86_64 Linux and macOS; aarch64 Linux (e.g. Linux on AWS Graviton); aarch64 macOS (e.g. M1 and M2 Apple Silicon chips); WebAssembly (metadata access only for now, but WebGPU runners are coming soon).

What is 'a carton'?

A carton is the output of the packing step. It is a zip file that contains your original model and some metadata. It does not modify the original model, avoiding error-prone conversion steps.

Why use Carton instead of ONNX?

ONNX converts models while Carton wraps them. Carton uses the underlying framework (e.g. PyTorch) to actually execute a model under the hood. This is important because it makes it easy to use custom ops, TensorRT, etc without changes. For some sophisticated models, 'conversion' steps (e.g. to ONNX) can be problematic and require validation. By removing these conversion steps, Carton enables faster experimentation, deployment, and iteration. With that said, we plan to support ONNX models within Carton. This lets you use ONNX if you choose and it enables some interesting use cases (like running models in-browser with WASM).