Stable Audio Open

Open source text-to-audio generation for short audio samples.

Free

Description

Stable Audio Open is an open-source text-to-audio model developed by Stability AI. It specializes in generating stereo audio samples up to 47 seconds in length at a 44.1kHz sample rate directly from user-provided text prompts. The model is particularly adept at creating specific types of audio elements like drum beats, instrument riffs, ambient sounds, and foley recordings, making it a valuable tool for music production and sound design workflows.

This model utilizes a transformer-based diffusion architecture, operating within the latent space of an autoencoder and conditioned using T5-based text embeddings. It was trained exclusively on a large dataset of audio recordings licensed under CC0, CC BY, or CC Sampling+ from FreeSound and the Free Music Archive, ensuring the training data is free from copyrighted music. While powerful for generating audio samples, Stable Audio Open is not optimized for creating complete songs, complex melodies, or human vocals. The model weights are accessible via Hugging Face under a non-commercial research license, and it's designed for use with the accompanying `stable-audio-tools` library for inference and fine-tuning.

Key Features

Text-to-Audio Generation: Creates audio directly from text descriptions.
Variable Length Output: Generates audio samples up to 47 seconds long.
High-Quality Audio: Outputs stereo audio at a 44.1kHz sample rate.
Specialized Audio Generation: Optimized for drum beats, instrument riffs, ambient sounds, and foley.
Open Source Model: Model weights are publicly available on Hugging Face.
Fine-tuning Capability: Allows users to fine-tune the model on custom audio datasets.
Transformer-Based Architecture: Built using an autoencoder, T5 text embedding, and a diffusion model.
Ethically Sourced Training Data: Trained on audio from FreeSound and Free Music Archive under open licenses (CC0, CC BY, CC Sampling+).

Use Cases

Creating custom drum beats for music production.
Generating unique instrument riffs and loops.
Producing ambient soundscapes for videos or games.
Creating realistic foley sound effects for film and media.
Generating audio samples for sound design projects.
Fine-tuning the model for specialized audio generation tasks (e.g., specific drum kits).