Abstract
Estimating a high-quality depth map from a single RGB image is a challenging task due to its ill-posed nature. Recently, two dominant trends in computer vision have been the subject of extensive research: attention mechanisms and multi-layer perceptron (MLP)-based vision models. Attention mechanisms, especially multi-head attention (MHA), have demonstrated significant improvements in depth estimation. MHA excels in capturing long-distance information and pixel relationships, yet its complexity quadratically increases with spatial resolution. Consequently, applying MHA to unmanned aerial vehicles with limited hardware resources is infeasible. In contrast, MLP-based vision models offer faster inference due to their linear computational complexity concerning spatial resolution. However, the inherent weakness of the MLP’s inductive bias can hinder generalization without a substantial amount of data. Moreover, the absence of location-dependent local dependencies can hinder the precise estimation of locally detailed depth maps. To address these challenges, this study introduces a novel module called EPsMLP (Enhanced Parallel sparse-MLP), which consists of three parallel branches, including sparse-MLP, local sparse attention, and channel attention. This module can capture global and local dependencies while benefiting from the inductive bias on locality. Furthermore, multi-scale convolutions are used to extract context at various scales for diverse objects. The architecture adopts an encoder-decoder-based structure, incorporating a pre-trained DenseNet-121 encoder. Experimental evaluations were conducted using the NYU-Depth-V2 and KITTI datasets, which are commonly used in monocular depth estimation. The extensive results demonstrate that our network is more efficient and effective than previously proposed methods.
| Original language | English |
|---|---|
| Pages (from-to) | 928-935 |
| Number of pages | 8 |
| Journal | Journal of Institute of Control, Robotics and Systems |
| Volume | 29 |
| Issue number | 11 |
| DOIs | |
| State | Published - 2023 |
Keywords
- deep learning
- global dependency
- local dependency
- monocular depth estimation
- multi-scale context
- uav
Fingerprint
Dive into the research topics of 'Enhanced Parallel sparse-MLP for Monocular Depth Estimation of Autonomous UAV'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver