Muse: A Text-to-Image Generation Model by Google AI

The new text-to-image model from Google AI outperforms many existing image generation models.

January 9, 2023

GoogleAI has introduced a novel text-to-image synthesizing model, Muse, using a masked image modeling approach with generative transformers. Muse is trained on a masked modeling task in discrete token space using the text embedding derived from a pre-trained large language model (LLM).

Generative image models have advanced significantly over the past few years because of novel training methods and improved deep learning architectures. As a result, many image generation models like DALL-E 2, Midjourney, and Stable Diffusion have been developed. But with Muse, Google takes the technology a step further.

Muse comprises several sub-models, like the VQGAN tokenizer model for encoding and decoding, a base masked image model to predict marginal distributions of tokens, and a superres transformer model to transform low-resolution into high-resolution with T5-XXL embeddings.

Since Muse employs discrete tokens and needs fewer sample iterations than pixel-space diffusion models like Imagen and DALL-E 2, it claims to be more efficient. The model iteratively resamples image tokens based on a language prompt to produce a zero-shot, mask-free editing for free.

The researchers trained multiple Muse models with varying sizes between 632M to 3B parameters. Muse uses parallel decoding architecture, combining several decoded bits to accomplish an instruction. Due to this architecture, Muse outperforms Parti, an autoregressive model. The researchers also claim that Muse is approximately 10 times faster at inference than Imagen 3B or Parti 3B models.

Per the PartiPrompts assessment, Muse generates images better related to the text prompt at least 2.7 times more accurately than Stable Diffusion, as it can generate images using nouns, adjectives, verbs, and other parts of speech.

For more information, refer to the paper.

Muse: A Text-to-Image Generation Model by Google AI

LEAVE A REPLY Cancel reply

Most Popular

Elevating Cybersecurity for Digital Customer Platforms

Muse: A Text-to-Image Generation Model by Google AI

Subscribe to our newsletter

RELATED ARTICLES

GitHub CEO Thomas Dohmke Resigns to Return to Startup Life

Google Rolls Out Deep Think in Gemini App to Power Ultra‑Reasoning AI

Meta Unveils Vision for Personal Superintelligence

LEAVE A REPLY Cancel reply

Most Popular

Elevating Cybersecurity for Digital Customer Platforms