TY - GEN
T1 - Deep Geometric Moments Promote Shape Consistency in Text-to-3D Generation
AU - Nath, Utkarsh
AU - Goel, Rajeev
AU - Jeon, Eun Som
AU - Kim, Changhoon
AU - Min, Kyle
AU - Yang, Yezhou
AU - Yang, Yingzhen
AU - Turaga, Pavan
N1 - Publisher Copyright:
© 2025 IEEE.
PY - 2025
Y1 - 2025
N2 - To address the data scarcity associated with 3D assets, 2D-lifting techniques such as Score Distillation Sampling (SDS) have become a widely adopted practice in text-to-3D generation pipelines. However, the diffusion models used in these techniques are prone to viewpoint bias and thus lead to geometric inconsistencies such as the Janus problem. To counter this, we introduce MT3D, a text-to-3D generative model that leverages a high-fidelity 3D object to overcome viewpoint bias and explicitly infuse geometric understanding into the generation pipeline. Firstly, we employ depth maps derived from a high-quality 3D model as control signals to guarantee that the generated 2D images preserve the funda-mental shape and structure, thereby reducing the inherent viewpoint bias. Next, we utilize deep geometric moments to ensure geometric consistency in the 3D representation explicitly. By incorporating geometric details from a 3D asset, MT3D enables the creation of diverse and geometri-cally consistent objects, thereby improving the quality and usability of our 3D representations. Project page and code: https://moment-3d.github.io/
AB - To address the data scarcity associated with 3D assets, 2D-lifting techniques such as Score Distillation Sampling (SDS) have become a widely adopted practice in text-to-3D generation pipelines. However, the diffusion models used in these techniques are prone to viewpoint bias and thus lead to geometric inconsistencies such as the Janus problem. To counter this, we introduce MT3D, a text-to-3D generative model that leverages a high-fidelity 3D object to overcome viewpoint bias and explicitly infuse geometric understanding into the generation pipeline. Firstly, we employ depth maps derived from a high-quality 3D model as control signals to guarantee that the generated 2D images preserve the funda-mental shape and structure, thereby reducing the inherent viewpoint bias. Next, we utilize deep geometric moments to ensure geometric consistency in the 3D representation explicitly. By incorporating geometric details from a 3D asset, MT3D enables the creation of diverse and geometri-cally consistent objects, thereby improving the quality and usability of our 3D representations. Project page and code: https://moment-3d.github.io/
KW - geometric consistency
KW - geometric moments
KW - text-to-3d generation
UR - https://www.scopus.com/pages/publications/105003644063
U2 - 10.1109/WACV61041.2025.00425
DO - 10.1109/WACV61041.2025.00425
M3 - Conference contribution
AN - SCOPUS:105003644063
T3 - Proceedings - 2025 IEEE Winter Conference on Applications of Computer Vision, WACV 2025
SP - 4331
EP - 4341
BT - Proceedings - 2025 IEEE Winter Conference on Applications of Computer Vision, WACV 2025
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2025 IEEE/CVF Winter Conference on Applications of Computer Vision, WACV 2025
Y2 - 28 February 2025 through 4 March 2025
ER -