TY - JOUR
T1 - MSF-NET
T2 - Foreground Objects Detection With Fusion of Motion and Semantic Features
AU - Kim, Jae Yeul
AU - Ha, Jong Eun
N1 - Publisher Copyright:
© 2013 IEEE.
PY - 2023
Y1 - 2023
N2 - Visual surveillance requires robust detection of foreground objects under challenging environments of abrupt lighting variation, stationary foreground objects, dynamic background objects, and severe weather conditions. Most classical algorithms leverage background model images produced by statistical modeling of the change of brightness values over time. Since they have difficulties using global features, many false detections occur at the stationary foreground regions and dynamic background objects. Recent deep learning-based methods can easily reflect global characteristics compared to classical methods. However, deep learning-based methods still need to be improved in utilizing spatiotemporal information. We propose an algorithm for efficiently using spatiotemporal information by adopting a split and merge framework. First, we split spatiotemporal information on successive multiple images into spatial and temporal parts using two sub-networks of semantic and motion networks. Finally, separated information is fused in a spatiotemporal fusion network. The proposed network consists of three sub-networks, which we note as MSF-NET (Motion and Semantic features Fusion NETwork). Also, we propose a method to train the proposed MSF-NET stably. Compared to the latest deep learning algorithms, the proposed MSF-NET gives 9% and 13% higher FM in the LASIESTA and SBI datasets. Also, we designed the proposed MSF-NET to be lightweight to run in real-time on a desktop GPU.
AB - Visual surveillance requires robust detection of foreground objects under challenging environments of abrupt lighting variation, stationary foreground objects, dynamic background objects, and severe weather conditions. Most classical algorithms leverage background model images produced by statistical modeling of the change of brightness values over time. Since they have difficulties using global features, many false detections occur at the stationary foreground regions and dynamic background objects. Recent deep learning-based methods can easily reflect global characteristics compared to classical methods. However, deep learning-based methods still need to be improved in utilizing spatiotemporal information. We propose an algorithm for efficiently using spatiotemporal information by adopting a split and merge framework. First, we split spatiotemporal information on successive multiple images into spatial and temporal parts using two sub-networks of semantic and motion networks. Finally, separated information is fused in a spatiotemporal fusion network. The proposed network consists of three sub-networks, which we note as MSF-NET (Motion and Semantic features Fusion NETwork). Also, we propose a method to train the proposed MSF-NET stably. Compared to the latest deep learning algorithms, the proposed MSF-NET gives 9% and 13% higher FM in the LASIESTA and SBI datasets. Also, we designed the proposed MSF-NET to be lightweight to run in real-time on a desktop GPU.
KW - Deep learning
KW - foreground object detection
KW - spatiotemporal information
KW - visual surveillance
UR - http://www.scopus.com/inward/record.url?scp=85181543199&partnerID=8YFLogxK
U2 - 10.1109/ACCESS.2023.3345842
DO - 10.1109/ACCESS.2023.3345842
M3 - Article
AN - SCOPUS:85181543199
SN - 2169-3536
VL - 11
SP - 145551
EP - 145565
JO - IEEE Access
JF - IEEE Access
ER -