Researchers from NVIDIA have announced Magic3D, an AI model that generates 3D mesh models from text inputs. Once given a prompt, Magic3D generates a model with colored textures and contours in about 40 minutes.
NVIDIA is mounting Magic3D in response to Google’s DreamFusion, another text-to-3D AI model. DreamFusion generates 2D images via text-to-image and optimizes them into volumetric NeRF (neural radiance fields) data. Magic3D uses a method similar to DreamFusion, but in a two-part process to take a coarse model with a lower resolution and then optimize it to a higher resolution.
In the first process, Magic3D uses a base diffuser similar to that in DreamFusion. This diffuser is used to compute gradients of the scene model via a loss defined on rendered images at a low resolution of 64 × 64. In the second stage, LDM (latent diffusion model) is used for backpropagating gradients into images of the higher resolution of 512 x 512.
Read More: TorchOpt: A New Python Library for Optimization
Magic3D is a significant enhancement to DreamFusion as it improves several design aspects. It consists of both low- and high-resolution diffusion priors that learn the 3D representations of target content. Magic3D synthesizes content with 8x higher resolution and 2x faster than DreamFusion.
Researchers hope that Magic3D will enable 3D model creation without prior model training and could accelerate video game development and VR-based applications. They concluded the research paper by saying, “We hope with Magic3D, we can democratize 3D synthesis and open up everyone’s creativity in 3D content creation.”