3DforXR project is an Open Call project funded by the SERMAS EU Research Project. It aims to develop a multimodal software module for the generation of textured 3D mesh models from 2D images or text. During the first three months of the project the first two modalities were implemented, leading to the 1st release MVP, a REST API that generates a basic, functional 3D model from 2D images.
The first modality supports 3D reconstruction from multiple overlapping 2D images of an object. Two pipelines were adopted. The first implements Structure-from-Motion, Multi-View Stereo and Multi-view Texture Mapping, which are best suited for objects with rich textures, while the second is based on Neural Radiance Fields (NeRF), which can reconstruct effectively challenging texture-less surfaces. Both pipelines are complemented by an automatic background removal process, to identify the major object in a scene for 3D reconstruction. The second modality supports 3D model prediction from a single image based on a diffusion model that generates novel poses of a single-view object, coupled with a state-of-the-art pre-trained neural surface reconstruction approach.
During the second trimester (M4-M6) these two modalities were improved leading to refined 3D models. A library of 3D processing tools was also implemented, allowing users to further post process both the geometry and the appearance of the derived 3D assets.
The 2nd release MVP is again a REST API that generates refined 3D assets from multiple or single images and a second API to further process the 3D mesh models and their texture. The MVP was tested on object categories suggested by the project User Partners, and the quality of the results is improved compared to the ones of 1st release.The basic improvement of the 3D reconstruction module from multiple overlapping 2D images focuses on the elimination of isolated components and sharp edges and the improvement of texture generation in the Neural Radiance Fields (NeRF) pipeline. Regarding the second modality of 3D model prediction from a single image several improvements were implemented, the most effective of which is the addition of a neural texture refinement pipeline that takes advantage of the input image and seamlessly blends it into the frontal side of the mesh.
The current solution is TRL4. Three KPIs were achieved during the first release, regarding the success rate of 3D model generation, the 3D models accuracy and qualitative feedback from users on more than 80 reconstructed 3D models. Two KPIs were achieved during the second release, regarding the 3D models geometric accuracy and texture quality. To showcase the improvement of results and further assist the user evaluation procedure, an interactive webpage was developed that displays a comparison of the generated 3D models of the two releases.