TY - JOUR
T1 - 3D Semantic Scene Completion With Multi-scale Feature Maps and Masked Autoencoder
AU - Park, Sang Min
AU - Ha, Jong Eun
N1 - Publisher Copyright:
© ICROS 2023.
PY - 2023
Y1 - 2023
N2 - Autonomous systems require a profound understanding of their surroundings, encompassing both semantic and 3D geometry. This study focuses on advancing 3D semantic scene completion approaches using a camera. Building upon the foundation laid by VoxFormer [1], which is recognized for its state-of-the-art performance in 3D semantic scene completion, our approach involves two distinct stages. In the initial stage, scene completion is done with depth images, while in the second stage, the final 3D scene completion is performed using masked autoencoder. To enhance the performance of VoxFormer, we introduced two key modifications. First, we modified the first stage using multi-scale feature maps. Second, we further modified the first stage using a masked autoencoder. Experimental results, based on the adapted VoxFormer model in both stages are presented. Our two proposed approaches exhibit notable improvements, particularly in the context of small objects. However, these enhancements warrant further investigation for optimization and refinement.
AB - Autonomous systems require a profound understanding of their surroundings, encompassing both semantic and 3D geometry. This study focuses on advancing 3D semantic scene completion approaches using a camera. Building upon the foundation laid by VoxFormer [1], which is recognized for its state-of-the-art performance in 3D semantic scene completion, our approach involves two distinct stages. In the initial stage, scene completion is done with depth images, while in the second stage, the final 3D scene completion is performed using masked autoencoder. To enhance the performance of VoxFormer, we introduced two key modifications. First, we modified the first stage using multi-scale feature maps. Second, we further modified the first stage using a masked autoencoder. Experimental results, based on the adapted VoxFormer model in both stages are presented. Our two proposed approaches exhibit notable improvements, particularly in the context of small objects. However, these enhancements warrant further investigation for optimization and refinement.
KW - deep learning
KW - scene completion
KW - scene understanding
KW - semantic segmentation
UR - http://www.scopus.com/inward/record.url?scp=85180370164&partnerID=8YFLogxK
U2 - 10.5302/J.ICROS.2023.23.0143
DO - 10.5302/J.ICROS.2023.23.0143
M3 - Article
AN - SCOPUS:85180370164
SN - 1976-5622
VL - 29
SP - 966
EP - 972
JO - Journal of Institute of Control, Robotics and Systems
JF - Journal of Institute of Control, Robotics and Systems
IS - 12
ER -