OpenAI introduced its new open-source AI generator that generates 3D models based on text prompts. Point-E, a machine learning system, differs from other traditional 3D generators because it uses discrete data points to represent 3D shapes rather than create them.
3D modeling is a highly applicable technology in movies, video games, AR, VR, metaverse, etc. However, producing photorealistic 3D graphics still requires a lot of resources and effort, and doing so using text prompts is a further achievement.
Taking inspiration from the recently viral text-to-image systems like DALL-E, Lensa, and HuggingFace’s Stable Diffusion, Point-E attempts to enhance text-to-3D technology. Point-E, or Point Efficiency, uses point clouds as they are easily synthesized in terms of computational requirements. Unlike existing systems like DreamFusion, Point-E does not require hours of GPU functions. However, its resolution is not that great.
Read More: Zhejiang, Among Other Chinese Provinces, to Build a US$28.7b Industry Metaverse by 2025
OpenAI’s research team, led by Alex Nichol, said, “Other systems leverage a large corpus of (text, image) pairs, allowing it to follow diverse and complex prompts, while our image-to-3D model is trained on a smaller dataset of (image, 3D) pairs.”
When prompted with a text, Point-E first creates a synthetic 3D rendering. It will then run this version through a series of diffusion models to generate a 3D, RGB, 1024-point cloud model. The next step generates a finer version of the same, with 4096 points. Each of these diffusion models was developed using “millions” of 3D models that had all been transformed into a standardized format.
OpenAI has released the source code on Github, along with the research paper.