TY - GEN
T1 - Hierarchical Model for Long-Length Video Summarization with Adversarially Enhanced Audio/Visual Features
AU - Lee, Hansol
AU - Lee, Gyemin
N1 - Publisher Copyright:
© 2020 IEEE.
PY - 2020/10
Y1 - 2020/10
N2 - In this paper, we propose a novel supervised method for summarizing long-length videos. Many recent approaches presented promising results in video summarization. However, videos in most benchmark datasets are short in duration (10 minutes), and the methods often do not work well for very long-length videos (> 1 hour). Furthermore, most approaches only use visual features, while audios provide useful information for the task. Based on these observations, we present a model that exploits both audio and visual features. To handle long videos, the hierarchical structure of our model captures both the short-term and long-term temporal dependencies. Our model also refines the extracted features using adversarial networks. To demonstrate our model, we have collected a new dataset of 28 baseball (∼ 3.5 hours) videos, accompanied by an editorial summary video that is 5% in length of the original video. Evaluation on the dataset suggests that our method produces quality summaries for very long videos.
AB - In this paper, we propose a novel supervised method for summarizing long-length videos. Many recent approaches presented promising results in video summarization. However, videos in most benchmark datasets are short in duration (10 minutes), and the methods often do not work well for very long-length videos (> 1 hour). Furthermore, most approaches only use visual features, while audios provide useful information for the task. Based on these observations, we present a model that exploits both audio and visual features. To handle long videos, the hierarchical structure of our model captures both the short-term and long-term temporal dependencies. Our model also refines the extracted features using adversarial networks. To demonstrate our model, we have collected a new dataset of 28 baseball (∼ 3.5 hours) videos, accompanied by an editorial summary video that is 5% in length of the original video. Evaluation on the dataset suggests that our method produces quality summaries for very long videos.
KW - adversarial learning
KW - hierarchical model
KW - long-length videos
KW - multimodal features
KW - video summarization
UR - http://www.scopus.com/inward/record.url?scp=85098669534&partnerID=8YFLogxK
U2 - 10.1109/ICIP40778.2020.9190636
DO - 10.1109/ICIP40778.2020.9190636
M3 - Conference contribution
AN - SCOPUS:85098669534
T3 - Proceedings - International Conference on Image Processing, ICIP
SP - 723
EP - 727
BT - 2020 IEEE International Conference on Image Processing, ICIP 2020 - Proceedings
PB - IEEE Computer Society
T2 - 2020 IEEE International Conference on Image Processing, ICIP 2020
Y2 - 25 September 2020 through 28 September 2020
ER -