Summarizing long-length videos with GAN-enhanced audio/visual features

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

2 Scopus citations

Abstract

In this paper, we propose a novel supervised method for summarizing long-length videos. Many recent works presented successful results in video summarization. However, most videos in those works are short in duration (~5 minutes), and the methods often break down on very long videos (~30 minutes). Moreover, most works only use visual features, while audios provide useful features for the task. Based on these observations, we present a model that exploits both visual and audio features. To handle long videos, our model also refines the extracted features using adversarial networks. To demonstrate our model, we have collected a new dataset of 63 e-sports (~30 minutes) videos, each accompanied by an editorial summary video that is about 10% in length of the original video. Evaluation on this dataset suggests that our method produces quality summaries for very long videos.

Original languageEnglish
Title of host publicationProceedings - 2019 International Conference on Computer Vision Workshop, ICCVW 2019
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages3727-3731
Number of pages5
ISBN (Electronic)9781728150239
DOIs
StatePublished - Oct 2019
Event17th IEEE/CVF International Conference on Computer Vision Workshop, ICCVW 2019 - Seoul, Korea, Republic of
Duration: 27 Oct 201928 Oct 2019

Publication series

NameProceedings - 2019 International Conference on Computer Vision Workshop, ICCVW 2019

Conference

Conference17th IEEE/CVF International Conference on Computer Vision Workshop, ICCVW 2019
Country/TerritoryKorea, Republic of
CitySeoul
Period27/10/1928/10/19

Keywords

  • Audio
  • GAN
  • Multimodal
  • Summarization
  • Video

Fingerprint

Dive into the research topics of 'Summarizing long-length videos with GAN-enhanced audio/visual features'. Together they form a unique fingerprint.

Cite this