Monday, April 15, 2024
HomeNewsGoogle Researchers Unveil Phenaki, a System That Generates Videos from Text

Google Researchers Unveil Phenaki, a System That Generates Videos from Text

The system produces videos of a few seconds using story-like descriptions.

Ruben Villegas and a few other researchers at Google unveil Phenaki, a system that generates videos from story-like descriptions given as text prompts. There are only a few datasets that can be used for text-to-video generation, but there are many text-to-image pairs available, using which Google has also developed text-to-image frameworks like Imagen.

Now, text-to-video generator Phenaki generates short videos by using images as single-frame videos and clubbing them together with a dataset of short videos having captions.

Phenaki works using some main components. It uses an encoder for video embedding, a language model for text embedding, a MaskGIT bidirectional transformer, and a decoder.

The system uses a “videos less than three seconds long” dataset to train the C-ViViT encoder/decoder to generate embeddings. The encoder is trained to generate non-overlapping patches as vectors by splitting frames. The decoder is trained to convert embeddings into pixels.

Read More: Qiskit Launches Quantum Computing course as YouTube series.

Phenaki uses the t5x language model to produce text embedding. MaskGIT generates the masked embeddings at inference using a set of masked video embeddings and text embeddings and then re-masks a portion of them to be generated in subsequent iterations.

To create minute-long videos, the authors repeatedly combined MaskGIT and C-ViViT. They first created a short film from a single sentence, after which they encoded the final k frames. They combined the text after the video embeddings to create more video frames.
Unlike Make-A-Video, which uses several diffusion models to generate short videos and then upscale its resolution, Phenaki bootstraps its own frames to enhance throughput and narrative complexity.

Subscribe to our newsletter

Subscribe and never miss out on such trending AI-related articles.

We will never sell your data

Join our WhatsApp Channel and Discord Server to be a part of an engaging community.

Disha Chopra
Disha Chopra
Disha Chopra is a content enthusiast! She is an Economics graduate pursuing her PG in the same field along with Data Sciences. Disha enjoys the ever-demanding world of content and the flexibility that comes with it. She can be found listening to music or simply asleep when not working!


Please enter your comment!
Please enter your name here

Most Popular