TY - JOUR
T1 - Enhanced Parallel sparse-MLP for Monocular Depth Estimation of Autonomous UAV
AU - Park, Cheol Hoon
AU - Choi, Hyun Duck
N1 - Publisher Copyright:
© ICROS 2023.
PY - 2023
Y1 - 2023
N2 - Estimating a high-quality depth map from a single RGB image is a challenging task due to its ill-posed nature. Recently, two dominant trends in computer vision have been the subject of extensive research: attention mechanisms and multi-layer perceptron (MLP)-based vision models. Attention mechanisms, especially multi-head attention (MHA), have demonstrated significant improvements in depth estimation. MHA excels in capturing long-distance information and pixel relationships, yet its complexity quadratically increases with spatial resolution. Consequently, applying MHA to unmanned aerial vehicles with limited hardware resources is infeasible. In contrast, MLP-based vision models offer faster inference due to their linear computational complexity concerning spatial resolution. However, the inherent weakness of the MLP’s inductive bias can hinder generalization without a substantial amount of data. Moreover, the absence of location-dependent local dependencies can hinder the precise estimation of locally detailed depth maps. To address these challenges, this study introduces a novel module called EPsMLP (Enhanced Parallel sparse-MLP), which consists of three parallel branches, including sparse-MLP, local sparse attention, and channel attention. This module can capture global and local dependencies while benefiting from the inductive bias on locality. Furthermore, multi-scale convolutions are used to extract context at various scales for diverse objects. The architecture adopts an encoder-decoder-based structure, incorporating a pre-trained DenseNet-121 encoder. Experimental evaluations were conducted using the NYU-Depth-V2 and KITTI datasets, which are commonly used in monocular depth estimation. The extensive results demonstrate that our network is more efficient and effective than previously proposed methods.
AB - Estimating a high-quality depth map from a single RGB image is a challenging task due to its ill-posed nature. Recently, two dominant trends in computer vision have been the subject of extensive research: attention mechanisms and multi-layer perceptron (MLP)-based vision models. Attention mechanisms, especially multi-head attention (MHA), have demonstrated significant improvements in depth estimation. MHA excels in capturing long-distance information and pixel relationships, yet its complexity quadratically increases with spatial resolution. Consequently, applying MHA to unmanned aerial vehicles with limited hardware resources is infeasible. In contrast, MLP-based vision models offer faster inference due to their linear computational complexity concerning spatial resolution. However, the inherent weakness of the MLP’s inductive bias can hinder generalization without a substantial amount of data. Moreover, the absence of location-dependent local dependencies can hinder the precise estimation of locally detailed depth maps. To address these challenges, this study introduces a novel module called EPsMLP (Enhanced Parallel sparse-MLP), which consists of three parallel branches, including sparse-MLP, local sparse attention, and channel attention. This module can capture global and local dependencies while benefiting from the inductive bias on locality. Furthermore, multi-scale convolutions are used to extract context at various scales for diverse objects. The architecture adopts an encoder-decoder-based structure, incorporating a pre-trained DenseNet-121 encoder. Experimental evaluations were conducted using the NYU-Depth-V2 and KITTI datasets, which are commonly used in monocular depth estimation. The extensive results demonstrate that our network is more efficient and effective than previously proposed methods.
KW - deep learning
KW - global dependency
KW - local dependency
KW - monocular depth estimation
KW - multi-scale context
KW - uav
UR - http://www.scopus.com/inward/record.url?scp=85175814371&partnerID=8YFLogxK
U2 - 10.5302/J.ICROS.2023.23.0119
DO - 10.5302/J.ICROS.2023.23.0119
M3 - Article
AN - SCOPUS:85175814371
SN - 1976-5622
VL - 29
SP - 928
EP - 935
JO - Journal of Institute of Control, Robotics and Systems
JF - Journal of Institute of Control, Robotics and Systems
IS - 11
ER -