Latent Diffusion Model for 3D (LDM3D) is a unique diffusion model that employs generative AI to produce realistic 3D visual content. It was developed by Intel Labs in partnership with Blockade Labs. The diffusion technique is used by LDM3D, the first model in the market, to develop a depth map that results in vibrant, immersive 3D images with 360-degree vistas.
With the help of this research, users will be able to interact with their text prompts in previously unimaginable ways, revolutionizing the way we interact with digital content. Users can convert a literary description of a calm tropical beach, a contemporary skyscraper, or a sci-fi cosmos into a 360-degree detailed panorama using the photos and depth maps produced by LDM3D.
A subset of 10,000 samples from the LAION-400M database, which comprises more than 400 million image-caption pairs, served as the basis for the dataset used to train LDM3D. The researchers annotated the training corpus using the Dense Prediction Transformer (DPT) large-depth estimation model, which was previously created at Intel Labs.
For every pixel in a picture, the DPT-large model delivers incredibly accurate relative depth. The LAION-400M dataset was created for research purposes to allow for model training on a bigger scale for the benefit of various research communities. An Intel AI supercomputer with Intel Xeon processors and Intel Habana Gaudi AI accelerators is used to train the LDM3D model. To create 360-degree views for immersive experiences, the final model and pipeline integrate the generated RGB image and depth map.
Intel and Blockade researchers created DepthFusion, a programme that uses common 2D RGB photographs and depth maps to produce realistic and interactive 360-degree view experiences, to show the potential of LDM3D. Text prompts are transformed into engaging digital experiences by DepthFusion using TouchDesigner, a node-based visual programming language for real-time interactive multimedia content.