Hierarchical Model for Long-Length Video Summarization with Adversarially Enhanced Audio/Visual Features

Hansol Lee, Gyemin Lee

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

7 Scopus citations

Abstract

In this paper, we propose a novel supervised method for summarizing long-length videos. Many recent approaches presented promising results in video summarization. However, videos in most benchmark datasets are short in duration (10 minutes), and the methods often do not work well for very long-length videos (> 1 hour). Furthermore, most approaches only use visual features, while audios provide useful information for the task. Based on these observations, we present a model that exploits both audio and visual features. To handle long videos, the hierarchical structure of our model captures both the short-term and long-term temporal dependencies. Our model also refines the extracted features using adversarial networks. To demonstrate our model, we have collected a new dataset of 28 baseball (∼ 3.5 hours) videos, accompanied by an editorial summary video that is 5% in length of the original video. Evaluation on the dataset suggests that our method produces quality summaries for very long videos.

Original languageEnglish
Title of host publication2020 IEEE International Conference on Image Processing, ICIP 2020 - Proceedings
PublisherIEEE Computer Society
Pages723-727
Number of pages5
ISBN (Electronic)9781728163956
DOIs
StatePublished - Oct 2020
Event2020 IEEE International Conference on Image Processing, ICIP 2020 - Virtual, Abu Dhabi, United Arab Emirates
Duration: 25 Sep 202028 Sep 2020

Publication series

NameProceedings - International Conference on Image Processing, ICIP
Volume2020-October
ISSN (Print)1522-4880

Conference

Conference2020 IEEE International Conference on Image Processing, ICIP 2020
Country/TerritoryUnited Arab Emirates
CityVirtual, Abu Dhabi
Period25/09/2028/09/20

Keywords

  • adversarial learning
  • hierarchical model
  • long-length videos
  • multimodal features
  • video summarization

Fingerprint

Dive into the research topics of 'Hierarchical Model for Long-Length Video Summarization with Adversarially Enhanced Audio/Visual Features'. Together they form a unique fingerprint.

Cite this