TY - JOUR
T1 - Depth-adaptive deep neural network for semantic segmentation
AU - Kang, Byeongkeun
AU - Lee, Yeejin
AU - Nguyen, Truong Q.
N1 - Publisher Copyright:
© 1999-2012 IEEE.
PY - 2018/9
Y1 - 2018/9
N2 - In this paper, we present the depth-adaptive deep neural network using a depth map for semantic segmentation. Typical deep neural networks receive inputs at the predetermined locations regardless of the distance from the camera. This fixed receptive field presents a challenge to generalize the features of objects at various distances in neural networks. Specifically, the predetermined receptive fields are too small at a short distance, and vice versa. To overcome this challenge, we develop a neural network that is able to adapt the receptive field not only for each layer but also for each neuron at the spatial location. To adjust the receptive field, we propose the depth-adaptive multiscale (DaM) convolution layer consisting of the adaptive perception neuron and the in-layer multiscale neuron. The adaptive perception neuron is to adjust the receptive field at each spatial location using the corresponding depth information. The in-layer multiscale neuron is to apply the different size of the receptive field at each feature space to learn features at multiple scales. The proposed DaM convolution is applied to two fully convolutional neural networks. We demonstrate the effectiveness of the proposed neural networks on the publicly available RGB-D dataset for semantic segmentation and the novel hand segmentation dataset for hand-object interaction. The experimental results show that the proposed method outperforms the state-of-the-art methods without any additional layers or preprocessing/postprocessing.
AB - In this paper, we present the depth-adaptive deep neural network using a depth map for semantic segmentation. Typical deep neural networks receive inputs at the predetermined locations regardless of the distance from the camera. This fixed receptive field presents a challenge to generalize the features of objects at various distances in neural networks. Specifically, the predetermined receptive fields are too small at a short distance, and vice versa. To overcome this challenge, we develop a neural network that is able to adapt the receptive field not only for each layer but also for each neuron at the spatial location. To adjust the receptive field, we propose the depth-adaptive multiscale (DaM) convolution layer consisting of the adaptive perception neuron and the in-layer multiscale neuron. The adaptive perception neuron is to adjust the receptive field at each spatial location using the corresponding depth information. The in-layer multiscale neuron is to apply the different size of the receptive field at each feature space to learn features at multiple scales. The proposed DaM convolution is applied to two fully convolutional neural networks. We demonstrate the effectiveness of the proposed neural networks on the publicly available RGB-D dataset for semantic segmentation and the novel hand segmentation dataset for hand-object interaction. The experimental results show that the proposed method outperforms the state-of-the-art methods without any additional layers or preprocessing/postprocessing.
KW - Semantic segmentation
KW - convolutional neural networks
KW - deep learning
UR - https://www.scopus.com/pages/publications/85040990891
U2 - 10.1109/TMM.2018.2798282
DO - 10.1109/TMM.2018.2798282
M3 - Article
AN - SCOPUS:85040990891
SN - 1520-9210
VL - 20
SP - 2478
EP - 2490
JO - IEEE Transactions on Multimedia
JF - IEEE Transactions on Multimedia
IS - 9
M1 - 8269325
ER -