There has been a lot of hype around generative AI since the beginning of 2022. Social media platforms such as Reddit and Twitter are full of images created through the generative machine learning models such as Stable Diffusion and DALL-E. Startups building products through generative models are attracting massive funding despite the market downturn. And large tech companies have started to integrate generative models into their mainstream products.
The concept of generative AI is not new. With a few exceptions, most of the advancements we are witnessing today have existed for several years. However, the emergence of several trends has made it possible to make the most out of the generative models and bring them to everyday applications. The field still has several challenges to overcome, but there is no doubt that the generative AI market is bound to grow in 2023.
Advancements in Generative AI
Generative AI became famous in 2014 with the rise of generative adversarial networks (GANs), which is a type of deep learning architecture that can create realistic images, for example, of faces from noise maps. Scientists later created other versions of GANs to perform different tasks, like converting the style of one image to another. GANs and the variational autoencoders (VAE), another deep learning architecture, later welcomed the era of deepfakes, which is an AI technique that modifies videos and images to swap one person’s face for another.
Read More: Top 10 Deepfake Apps For Android And IOS
The year 2017 ushered in the transformer, a deep learning architecture that underlies large language models (LLMs) such as GPT-3, LaMDA, and Gopher. The transformer generates text, software code, and even protein structures. A variation of the transformer, called the “vision transformer,” is also utilized for visual tasks such as image classification. A previous version of OpenAI’s DALL-E used the transformer to create images from text.
A technique introduced by OpenAI in 2021, called Contrastive Language-Image Pre-training (CLIP), became crucial in text-to-image generators. CLIP is effective at learning shared embeddings between text and images by learning from image-caption pairs collected from the internet. CLIP and diffusion (another deep learning technique used for generating images from noise) were utilized in DALLE-2 to create high-resolution images with stunning quality and detail.
As we moved toward 2022, larger models, better algorithms, and more extensive datasets helped improve the output of generative models, creating superior images, generating long stretches of (mostly) coherent text, and writing high-quality software code. Besides, several models became available for the general public to experiment with, making them popular among the masses. In September, OpenAI’s DALL-E became available to everyone. The company removed the waitlist to allow open access to its text-to-image generator DALL-E 2.Â
“More than 1.5 million users are now actively creating over 2 million images a day with DALL-E, from artists and creative directors to authors and architects, with about 100,000 users sharing their creations and feedback in our Discord community,” said an OpenAI spokesperson, elaborating on the popularity of their generative AI tool.
Newer Applications
Generative models were first released as systems that could work with big chunks of creative work. GANs became popular for generating complete images with significantly less input. LLMs like GPT-3 were in the spotlight for writing full articles.
But as the field evolved, it has become evident that generative AI models are pretty unreliable when left to their own whim. Many scientists believe that current deep learning models lack some of the essential components of intelligence, no matter how large they are, which makes them prone to committing unpredictable mistakes. Recently, Meta introduced a new large language model ‘Galactica’ to generate original academic papers with simple prompts. But as more and more people reported it to be full of “statistical nonsense” and that it was developing “wrong” content, the website withdrew the option for people to experiment with.Â
Product teams are finding that generative models perform best when implemented in ways that facilitate greater user control. The past year witnessed several products that use generative models in clever, human-centric ways. For instance, Copy AI, a tool that uses GPT-3 to create blog posts, has an interactive interface where the writer and the LLM create the outline of the article and build it up together. Applications developed with DALL-E 2 and Stable Diffusion also facilitate user control with features that allow for regenerating, configuring, or editing the output of the generative AI model.
As the principal scientist at Google Research, Douglas Eck, said at a recent AI conference, “It is no longer about a generative AI model that creates a realistic picture. It is about making something that you created yourself. Technology should serve our need for agency and creative control over our actions.”
Conclusion
The generative AI industry still has many challenges to overcome, including copyright and ethical complications. Nevertheless, it is interesting to see the generative AI field thrive. As major generative AI models become accessible to the general public, it is obvious that everyone is benefiting from these powerful tools. Moreover, big companies like Microsoft are making the most out of their exclusive access to OpenAI’s technology, cloud infrastructure, and the huge market for creativity tools to bring generative models to its users.
However, down the road, the real potential of generative AI might manifest itself in unexpected markets. Who knows, perhaps generative AI will give birth to a new era of applications that we have never thought of before.