OpenAI develops AI system capable of generating 3D models

OpenAI develops AI system capable of generating 3D models

OpenAI LLC this week detailed POINT-E, a new artificial intelligence system that can generate 3D models based on text prompts.

The research group has made the code for PUNT-E available on GitHub.

There are several AI applications that can generate two-dimensional images based on a text description provided by a user. According to OpenAI, such applications render images in seconds or minutes when running on a single data center graphics card. In contrast, generating a 3D model usually takes a few hours when using comparable hardware.

OpenAI built POINT-E to speed up the process. According to the research group, POINT-E can generate a 3D model in as little as one minute when running on an Nvidia V100 graphics card.

When the AI ​​system receives a user prompt describing an object, it does not directly generate a 3D model of the object. Instead, the AI ​​system first creates a two-dimensional drawing of the specified object. From there, POINT-E turns the two-dimensional drawing into a three-dimensional point cloud, which is a basic version of a 3D model that functions as an outline.

Each step of the process is performed by a different neural network. The first step, which involves turning the user’s description of an object into a two-dimensional drawing, is performed by a neural network called GLIDE that OpenAI originally released last year. The version of GLIDE used in POINT-E contains three billion parameters. Parameters are the configuration settings that define how a neural network goes about processing data.

After POINT-E generates a two-dimensional drawing of an object, the drawing is transformed into a point cloud by two separate neural networks. The first neural network generates an initial, low-resolution point cloud with 1,000 pixels. The second algorithm, described as a simpler version of the first, adds another 3,000 pixels to increase the point cloud’s resolution.

“For image distribution models, the best quality is typically achieved by using some form of hierarchy, where a low-resolution base model produces output that is then upsampled by another model,” OpenAI scientists explained in a research report explain POINT-E. “Our upsampler uses the same architecture as our base model.”

The neural networks that Point-E uses to generate 3D models are based on a machine learning method known as diffusion. The method, first introduced in 2015, also powers an image generation AI that Google LLC debuted earlier this year.

To build a diffusion model, engineers create images that contain a type of error known as Gaussian noise. They then strain the diffusion model to remove this noise. By repeating the process many times, a neural network can learn techniques that allow it to generate new images from scratch.

After POINT-E creates a point cloud of an object, it turns the point cloud into a 3D model using Blender, an open-source graphic design application. The process of creating a 3D model in Blender is managed by an automated script.

“While our method performs worse at this evaluation than state-of-the-art techniques, it delivers samples in a small fraction of the time,” OpenAI’s researchers explained. “This could make it more practical for certain applications, or could enable higher quality 3D object discovery by sampling many objects and selecting the best one.”

Image: OpenAI

Show your support for our mission by joining our Cube Club and Cube Event Community of experts. Join the community that includes Amazon Web Services and Amazon.com CEO Andy Jassy, ​​Dell Technologies founder and CEO Michael Dell, Intel CEO Pat Gelsinger, and many more greats and experts.

Leave a Reply

Your email address will not be published. Required fields are marked *