Enhanced Parallel sparse-MLP for Monocular Depth Estimation of Autonomous UAV

Cheol Hoon Park, Hyun Duck Choi

Research output: Contribution to journalArticlepeer-review

2 Scopus citations

Abstract

Estimating a high-quality depth map from a single RGB image is a challenging task due to its ill-posed nature. Recently, two dominant trends in computer vision have been the subject of extensive research: attention mechanisms and multi-layer perceptron (MLP)-based vision models. Attention mechanisms, especially multi-head attention (MHA), have demonstrated significant improvements in depth estimation. MHA excels in capturing long-distance information and pixel relationships, yet its complexity quadratically increases with spatial resolution. Consequently, applying MHA to unmanned aerial vehicles with limited hardware resources is infeasible. In contrast, MLP-based vision models offer faster inference due to their linear computational complexity concerning spatial resolution. However, the inherent weakness of the MLP’s inductive bias can hinder generalization without a substantial amount of data. Moreover, the absence of location-dependent local dependencies can hinder the precise estimation of locally detailed depth maps. To address these challenges, this study introduces a novel module called EPsMLP (Enhanced Parallel sparse-MLP), which consists of three parallel branches, including sparse-MLP, local sparse attention, and channel attention. This module can capture global and local dependencies while benefiting from the inductive bias on locality. Furthermore, multi-scale convolutions are used to extract context at various scales for diverse objects. The architecture adopts an encoder-decoder-based structure, incorporating a pre-trained DenseNet-121 encoder. Experimental evaluations were conducted using the NYU-Depth-V2 and KITTI datasets, which are commonly used in monocular depth estimation. The extensive results demonstrate that our network is more efficient and effective than previously proposed methods.

Original languageEnglish
Pages (from-to)928-935
Number of pages8
JournalJournal of Institute of Control, Robotics and Systems
Volume29
Issue number11
DOIs
StatePublished - 2023

Keywords

  • deep learning
  • global dependency
  • local dependency
  • monocular depth estimation
  • multi-scale context
  • uav

Fingerprint

Dive into the research topics of 'Enhanced Parallel sparse-MLP for Monocular Depth Estimation of Autonomous UAV'. Together they form a unique fingerprint.

Cite this