AI-Powered Voice Cloning: Introducing Audiobox by Meta

Metaverse Planet

December 12, 2023

Audiobox, presented as a groundbreaking research model for audio production, represents Meta’s latest venture in artificial intelligence, building upon its prior work with Voicebox.

Revealed on Meta’s official website by Facebook’s AI Research (FAIR) lab researchers, Audiobox serves as a foundational research model for audio generation.

This voice-centric AI tool boasts the capability to produce sounds and sound effects through a fusion of audio inputs and natural language text prompts, facilitating the creation of personalized audio for various applications.

Users can simply input a sentence for a cloned voice or provide a description of the desired sound, and Audiobox autonomously handles the rest. Additionally, users have the option to record their own voices, which Audiobox can then replicate.

Meta highlights the development of a family of models geared towards generating ambient sounds and sound effects, such as sirens or children playing.

These models are constructed based on the self-supervising Audiobox SSL model. Self-supervised learning (SSL) involves AI algorithms generating their own labels for unlabeled data, a departure from supervised learning where data is pre-labeled.

It’s important to note that Meta emphasizes the research-oriented nature of Audiobox in its interactive demonstrations, clarifying that it is not intended for commercial purposes.

Similar to Meta’s Imagine AI image generation web app introduced the previous week, Audiobox is not available as an open-source tool.

Explore our extensive selection of gifts and the budget-friendly Christmas catalog for your company now.