MusicGen is a revolutionary open-source deep-learning language model created by Meta‘s Audiocraft research team that can create original music based on text prompts. MusicGen works similarly to ChatGPT, but for audio. Users can specify the sort of music they want, choose whether or not to include an existing tune, and then click “Generate” to watch the magic happen.
Based on the given text prompts and tune, MusicGen generates an original and succinct musical piece after a brief processing period, a little under 3 minutes. Users have the chance to specify their preferred music style using a range of examples when using MusicGen in Hugging Face.
Users can use the promt “an 80s pop song with heavy drums and synth pads in the background.” Then, using settings that let them designate a specific section of the track, users can “condition” MusicGen using a chosen song fragment, up to 30 seconds in length. The “Generate” option on MusicGen makes it easy to create a high-quality of up to 12-second-long musical sample.
The research team used 20,000 hours of authorized music to train the MusicGen model. 10,000 excellent music tracks from an internal collection as well as songs from well-known sites like Shutterstock and Pond5 made up this large dataset.
The team used Meta’s 32KHz EnCodec audio tokenizer to generate smaller music pieces that could be processed simultaneously in order to maximize performance. Ahsen Khaliq, a Hugging Face ML Engineer, noted that MusicGen differs from other models in that it just needs 50 auto-regressive steps per second of audio and doesn’t call for a self-supervised semantic representation.