TY - JOUR
T1 - CP-CNN
T2 - Computational Parallelization of CNN-Based Object Detectors in Heterogeneous Embedded Systems for Autonomous Driving
AU - Chun, Dayoung
AU - Choi, Jiwoong
AU - Lee, Hyuk Jae
AU - Kim, Hyun
N1 - Publisher Copyright:
© 2013 IEEE.
PY - 2023
Y1 - 2023
N2 - The success of research using convolutional neural network (CNN)-based camera sensor processing for autonomous driving has accelerated the development of autonomous driving vehicles. Since autonomous driving algorithms require high-performance computing for fast and accurate perception, a heterogeneous embedded platform consisting of a graphics processing unit (GPU) and a power-efficient dedicated deep learning accelerator (DLA) has been developed to efficiently implement deep learning algorithms in limited hardware environments. However, because the hardware utilization of these platforms remains low, performance differences such as processing speed and power efficiency between the heterogeneous platform and an embedded platform with only GPUs remain insignificant. To address this problem, this paper proposes an optimization technique that fully utilizes the available hardware resources in heterogeneous embedded platforms using parallel processing on DLA and GPU. Our proposed power-efficient network inference method improves processing speed without losing accuracy based on analyzing the problems encountered when dividing the networks between DLA and GPU for parallel processing. Moreover, the high compatibility of the proposed method is demonstrated by applying the proposed method to various CNN-based object detectors. The experimental results show that the proposed method increases the processing speed by 77.8%, 75.6%, and 55.2% and improves the power efficiency by 84%, 75.9%, and 62.3% on YOLOv3, SSD, and YOLOv5 networks, respectively, without an accuracy penalty.
AB - The success of research using convolutional neural network (CNN)-based camera sensor processing for autonomous driving has accelerated the development of autonomous driving vehicles. Since autonomous driving algorithms require high-performance computing for fast and accurate perception, a heterogeneous embedded platform consisting of a graphics processing unit (GPU) and a power-efficient dedicated deep learning accelerator (DLA) has been developed to efficiently implement deep learning algorithms in limited hardware environments. However, because the hardware utilization of these platforms remains low, performance differences such as processing speed and power efficiency between the heterogeneous platform and an embedded platform with only GPUs remain insignificant. To address this problem, this paper proposes an optimization technique that fully utilizes the available hardware resources in heterogeneous embedded platforms using parallel processing on DLA and GPU. Our proposed power-efficient network inference method improves processing speed without losing accuracy based on analyzing the problems encountered when dividing the networks between DLA and GPU for parallel processing. Moreover, the high compatibility of the proposed method is demonstrated by applying the proposed method to various CNN-based object detectors. The experimental results show that the proposed method increases the processing speed by 77.8%, 75.6%, and 55.2% and improves the power efficiency by 84%, 75.9%, and 62.3% on YOLOv3, SSD, and YOLOv5 networks, respectively, without an accuracy penalty.
KW - Autonomous vehicle
KW - convolutional neural network
KW - embedded platform
KW - low-power design
KW - parallel processing
KW - real-time system
UR - https://www.scopus.com/pages/publications/85161040021
U2 - 10.1109/ACCESS.2023.3280552
DO - 10.1109/ACCESS.2023.3280552
M3 - Article
AN - SCOPUS:85161040021
SN - 2169-3536
VL - 11
SP - 52812
EP - 52823
JO - IEEE Access
JF - IEEE Access
ER -