Wednesday, May 29, 2024
HomeNewsMuse: A Text-to-Image Generation Model by Google AI

Muse: A Text-to-Image Generation Model by Google AI

The new text-to-image model from Google AI outperforms many existing image generation models.

GoogleAI has introduced a novel text-to-image synthesizing model, Muse, using a masked image modeling approach with generative transformers. Muse is trained on a masked modeling task in discrete token space using the text embedding derived from a pre-trained large language model (LLM).

Generative image models have advanced significantly over the past few years because of novel training methods and improved deep learning architectures. As a result, many image generation models like DALL-E 2, Midjourney, and Stable Diffusion have been developed. But with Muse, Google takes the technology a step further.

Muse comprises several sub-models, like the VQGAN tokenizer model for encoding and decoding, a base masked image model to predict marginal distributions of tokens, and a superres transformer model to transform low-resolution into high-resolution with T5-XXL embeddings.

Read more: DoNotPay’s Joshua Browder Worked Out a Refund Request on Call With a DeepFake AI-Voice and GPT

Since Muse employs discrete tokens and needs fewer sample iterations than pixel-space diffusion models like Imagen and DALL-E 2, it claims to be more efficient. The model iteratively resamples image tokens based on a language prompt to produce a zero-shot, mask-free editing for free.

The researchers trained multiple Muse models with varying sizes between 632M to 3B parameters. Muse uses parallel decoding architecture, combining several decoded bits to accomplish an instruction. Due to this architecture, Muse outperforms Parti, an autoregressive model. The researchers also claim that Muse is approximately 10 times faster at inference than Imagen 3B or Parti 3B models.

Per the PartiPrompts assessment, Muse generates images better related to the text prompt at least 2.7 times more accurately than Stable Diffusion, as it can generate images using nouns, adjectives, verbs, and other parts of speech.

For more information, refer to the paper.

Subscribe to our newsletter

Subscribe and never miss out on such trending AI-related articles.

We will never sell your data

Join our WhatsApp Channel and Discord Server to be a part of an engaging community.

Disha Chopra
Disha Chopra
Disha Chopra is a content enthusiast! She is an Economics graduate pursuing her PG in the same field along with Data Sciences. Disha enjoys the ever-demanding world of content and the flexibility that comes with it. She can be found listening to music or simply asleep when not working!


Please enter your comment!
Please enter your name here

Most Popular