3D Semantic Scene Completion With Multi-scale Feature Maps and Masked Autoencoder

Sang Min Park, Jong Eun Ha

Research output: Contribution to journalArticlepeer-review

2 Scopus citations

Abstract

Autonomous systems require a profound understanding of their surroundings, encompassing both semantic and 3D geometry. This study focuses on advancing 3D semantic scene completion approaches using a camera. Building upon the foundation laid by VoxFormer [1], which is recognized for its state-of-the-art performance in 3D semantic scene completion, our approach involves two distinct stages. In the initial stage, scene completion is done with depth images, while in the second stage, the final 3D scene completion is performed using masked autoencoder. To enhance the performance of VoxFormer, we introduced two key modifications. First, we modified the first stage using multi-scale feature maps. Second, we further modified the first stage using a masked autoencoder. Experimental results, based on the adapted VoxFormer model in both stages are presented. Our two proposed approaches exhibit notable improvements, particularly in the context of small objects. However, these enhancements warrant further investigation for optimization and refinement.

Original languageEnglish
Pages (from-to)966-972
Number of pages7
JournalJournal of Institute of Control, Robotics and Systems
Volume29
Issue number12
DOIs
StatePublished - 2023

Keywords

  • deep learning
  • scene completion
  • scene understanding
  • semantic segmentation

Fingerprint

Dive into the research topics of '3D Semantic Scene Completion With Multi-scale Feature Maps and Masked Autoencoder'. Together they form a unique fingerprint.

Cite this