Abstract
In this paper, we propose a parallel network architecture that exhibits improved performance by fusing two-dimensional (2D) and three-dimensional (3D) features. A voxel-based and a projection-based method were adopted to derive the results through one scan. Our approach consists of two parallel networks, extracts features along each dimension, and converges them in a fusion network. In the fusion network, the voxel blocks and 2D feature maps extracted from each structure are fused to the voxel grid and then trained through convolution. For effective training of 2D networks, we use data augmentation techniques using coordinate system rotation transformation. In addition, a multi-loss with weights applied to each dimension was employed to effectively enhance the performance of the system, and the results revealed that the system exhibited better performance than when a single loss was used. Our proposed method can achieve better performance by changing the performance of the 2D network and 3D network, which can be generalized using other structures.
Original language | English |
---|---|
Pages (from-to) | 1000-1007 |
Number of pages | 8 |
Journal | Journal of Institute of Control, Robotics and Systems |
Volume | 27 |
Issue number | 12 |
DOIs | |
State | Published - 2021 |
Keywords
- 3D Vision
- Point Cloud
- Semantic Segmentation