Carton
Run any ML model from any programming language with one open-source API.
Description
Carton is an open-source solution designed to simplify machine learning model deployment and inference by providing a unified API for all major frameworks. It enables users to package models with metadata into a zip file, allowing them to run models from any programming language without modifying the original files or performing complex conversions.
Built on an optimized Rust core, Carton manages framework-specific runners behind the scenes and supports multi-platform deployment, including support for Linux, macOS, and Apple Silicon architectures. This approach streamlines experimentation, iteration, and integration of ML models into various applications.
Key Features
- Unified API: Access any ML model across all major frameworks and languages with a single API.
- Model Wrapping: Packages models and metadata into a zip file without modifying the original model.
- Framework-Agnostic Inference: Decouples inference code from frameworks, enhancing flexibility and maintainability.
- Automatic Runner Management: Detects and fetches the correct model runner based on framework and version.
- Optimized Rust Core: Ensures minimal overhead with highly efficient async Rust implementation.
- Multi-language Support: Bindings available for Python, JavaScript, TypeScript, Rust, C, C++, C#, Java, Golang, Swift, Ruby, PHP, Kotlin, and Scala.
- Multi-platform Support: Runs on x86_64 and aarch64 Linux/macOS, Apple Silicon, and partially on WebAssembly.
- No Model Conversion Needed: Avoids error-prone conversion steps for models.
Use Cases
- Machine learning model deployment
- ML model inference from multiple languages
- Rapid experimentation and prototyping
- Running ML models on different hardware platforms
- Integrating AI into polyglot applications
- Bypassing model conversion challenges
Frequently Asked Questions
Why not use Torch, TF, etc. directly?
Ideally, the ML framework used to run a model should just be an implementation detail. By decoupling your inference code from specific frameworks, you can easily keep up with the cutting-edge.
How much overhead does Carton have?
Most of Carton is implemented in optimized async Rust code. Preliminary benchmarks with small inputs show an overhead of less than 100 microseconds (0.0001 seconds) per inference call. We're still optimizing things further with better use of Shared Memory. This should bring models with large inputs to similar levels of overhead.
What platforms does Carton support?
Currently, Carton supports the following platforms: x86_64 Linux and macOS; aarch64 Linux (e.g. Linux on AWS Graviton); aarch64 macOS (e.g. M1 and M2 Apple Silicon chips); WebAssembly (metadata access only for now, but WebGPU runners are coming soon).
What is 'a carton'?
A carton is the output of the packing step. It is a zip file that contains your original model and some metadata. It does not modify the original model, avoiding error-prone conversion steps.
Why use Carton instead of ONNX?
ONNX converts models while Carton wraps them. Carton uses the underlying framework (e.g. PyTorch) to actually execute a model under the hood. This is important because it makes it easy to use custom ops, TensorRT, etc without changes. For some sophisticated models, 'conversion' steps (e.g. to ONNX) can be problematic and require validation. By removing these conversion steps, Carton enables faster experimentation, deployment, and iteration. With that said, we plan to support ONNX models within Carton. This lets you use ONNX if you choose and it enables some interesting use cases (like running models in-browser with WASM).
You Might Also Like
AI Logo Maker
FreemiumUse Our AI Logo Maker Tool to Create Professional Logo
toby
Contact for PricingYour LiveAI speech interpreter
Repobase
FreeAI-Powered Investment Analyzer for Open Source Projects
Humantic AI
Free TrialThe World's First Buyer Intelligence Platform.
RunStocks
FreemiumAnalyze Historical Small Cap Gaps and Runs for Data-Driven Trading Decisions