Meta releases EnCodec, a neural network trained to reconstruct input audio signals into smaller files. Meta researchers claim to receive state-of-the-art results in low-bit-rate audio hypercompression.
EnCodec has a streaming encoder-decoder architecture that utilizes sequential modeling. Such convolutional-based encoder-decoder architectures are very potent in multiple audio-based jobs, like audio enhancement, audio bandwidth extension, audio separation, and many others.
EnCodec comes with three main components, Encoder, Quantizer, and Decoder. The Encoder network (E) transforms input audio into a latent representation (z) with a higher dimension and lower frame rate. Then the Quantizer (Q) compresses it to the desired target size in an MP3 format and outputs z𝔮. Finally, the Decoder network (G) transforms the compressed audio signal into a waveform (ẋ), nearly similar to the original one.
Meta researchers claim to have achieved a 10x compression rate vs MP3 at 64kbps without compromising audio quality. It is a pioneer research as this is the first time a 48kHz stereo audio was used as an input.
Meta released a research paper highlighting all the technical details and architecture behind EnCodec. The paper also highlights that a Transformer model of EnCodec can be used to make it more efficient and reduce audio bandwidth by 40% without any quality loss. To help developers and people with a technical background to understand more about EnCodec, Meta has also released the code.