Abstract
This study examines enhancing object detection by integrating an object-masking module into ViewFormer, a transformer-based model for 3D occupancy prediction from multi-view images. While ViewFormer effectively captures spatiotemporal information, it underperforms on small objects such as pedestrians and bicycles. To address this limitation, we designed a SegFormer-based object masking module that estimates object probabilities from BEV features and concatenates them as an additional feature channel. Experimental evaluations on the nuScenes dataset revealed an unexpected performance decline in overall metrics (mIoU, IoUgeo), particularly for small object detection. Subsequent analysis indicated weak mask activation and instability during initial training as key factors limiting the module’s effectiveness. These findings highlight the viability and constraints of object masking, underscoring the need for structural adjustments and improved training strategies to stabilize mask learning in future work.
| Original language | English |
|---|---|
| Pages (from-to) | 1160-1168 |
| Number of pages | 9 |
| Journal | Journal of Institute of Control, Robotics and Systems |
| Volume | 31 |
| Issue number | 10 |
| DOIs | |
| State | Published - 2025 |
Keywords
- 3D occupancy
- autonomous driving
- BEV representation
- deep learning
- object masking
- viewformer
Fingerprint
Dive into the research topics of 'Object Mask Module for Enhancing Multi-view 3D Occupancy Perception Performance Based on ViewFormer'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver