Sunday, July 21, 2024
HomeNewsByteDance AI introduces Magic Video, a text-to-video generation framework based on latent...

ByteDance AI introduces Magic Video, a text-to-video generation framework based on latent diffusion models 

Magic Video generates videos in the latent space using a pre-trained variational autoencoder.

ByteDance AI researchers have introduced ‘MagicVideo,’ which is an efficient framework for text-to-video generation based on latent diffusion models. 

Magic Video generates videos in the latent space with the help of a pre-trained variational autoencoder, which enables significantly less computational requirement for MagicVideo. 

MagicVideo makes use of 2D convolution instead of 3D convolutions to overcome getting video-text paired datasets. Temporal computation operators are used along with 2D convolution operations to process the spatial and temporal information present in the video. Moreover, using 2D convolutions allows MagicVideo to use pre-trained weights of text-to-image models. 

Read More: ChatGPT Fails To Prove Why OpenAI Is Far From Expositing Ethical Concerns In Language Models

Although switching from 3D to 2D convolution reduces the computational complexity significantly, the memory cost is still too much. Thus, MagicVideo shares equal weights for each of the 2D convolution operations.  

However, doing so can reduce the generation quality since this approach assumes that all the frames are almost identical, although, in reality, the temporal difference is present. To overcome this, MagicVideo uses a custom lightweight adaptor module to modify the frame distribution. 

MagicVideo learns the inter-frame relation with the help of a directed self-attention module. Frames are calculated on the basis of the previous ones, similar to the approach used in video encoding. Finally, produced video clips are enhanced using a post-processing module. 

Subscribe to our newsletter

Subscribe and never miss out on such trending AI-related articles.

We will never sell your data

Join our WhatsApp Channel and Discord Server to be a part of an engaging community.

Sahil Pawar
Sahil Pawar
I am a graduate with a bachelor's degree in statistics, mathematics, and physics. I have been working as a content writer for almost 3 years and have written for a plethora of domains. Besides, I have a vested interest in fashion and music.


Please enter your comment!
Please enter your name here

Most Popular