ByteDance AI introduces Magic Video, a text-to-video generation framework based on latent diffusion models

Magic Video generates videos in the latent space using a pre-trained variational autoencoder.

December 8, 2022

ByteDance AI researchers have introduced ‘MagicVideo,’ which is an efficient framework for text-to-video generation based on latent diffusion models.

Magic Video generates videos in the latent space with the help of a pre-trained variational autoencoder, which enables significantly less computational requirement for MagicVideo.

MagicVideo makes use of 2D convolution instead of 3D convolutions to overcome getting video-text paired datasets. Temporal computation operators are used along with 2D convolution operations to process the spatial and temporal information present in the video. Moreover, using 2D convolutions allows MagicVideo to use pre-trained weights of text-to-image models.

Although switching from 3D to 2D convolution reduces the computational complexity significantly, the memory cost is still too much. Thus, MagicVideo shares equal weights for each of the 2D convolution operations.

However, doing so can reduce the generation quality since this approach assumes that all the frames are almost identical, although, in reality, the temporal difference is present. To overcome this, MagicVideo uses a custom lightweight adaptor module to modify the frame distribution.

MagicVideo learns the inter-frame relation with the help of a directed self-attention module. Frames are calculated on the basis of the previous ones, similar to the approach used in video encoding. Finally, produced video clips are enhanced using a post-processing module.

ByteDance AI introduces Magic Video, a text-to-video generation framework based on latent diffusion models

LEAVE A REPLY Cancel reply

Most Popular

OpenAI Raised $122 Billion. The Math Still Doesn’t Close.

The First $1 Billion AI Company With One Employee Is Here — And It’s Not Who You Think

ByteDance AI introduces Magic Video, a text-to-video generation framework based on latent diffusion models

Subscribe to our newsletter

RELATED ARTICLES

The First $1 Billion AI Company With One Employee Is Here — And It’s Not Who You Think

OpenAI Raised $122 Billion. The Math Still Doesn’t Close.

Yann LeCun Launches AMI Labs

LEAVE A REPLY Cancel reply

Most Popular

OpenAI Raised $122 Billion. The Math Still Doesn’t Close.

The First $1 Billion AI Company With One Employee Is Here — And It’s Not Who You Think