Describing Environmental Information in Videos Using Machine Learning

Yoon Jin Jeong, Soe Sandi Htun, Ji Hyeong Han

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

The previous researches of video captioning task have focused on human actions or objects in videos, however, environmental information such as place, time, weather among others is also important information to understand videos. Therefore, in this paper, we create a new dataset which adds environmental information labels to MSVD dataset and train the machine learning model to analyze environmental information from videos. We apply R(2+1)D which is a 3D CNN model to extract video features and S2VT which is a RNN model to encode the video features and to decode the environmental information. The reason why we define the problem as a sequence to sequence problem, not multilabel classification, is that the input is a video, which is a sequence of frames, and the output is also related with each other. For example, if the place label is outside, then next label would be weather. We analyze the experimental results based on BLEU, METEOR, ROUGE-L, and CIDEr and it shows the competitive results compared to the state-of-the-art video captioning model.

Original languageEnglish
Title of host publication2021 21st International Conference on Control, Automation and Systems, ICCAS 2021
PublisherIEEE Computer Society
Pages2247-2249
Number of pages3
ISBN (Electronic)9788993215212
DOIs
StatePublished - 2021
Event21st International Conference on Control, Automation and Systems, ICCAS 2021 - Jeju, Korea, Republic of
Duration: 12 Oct 202115 Oct 2021

Publication series

NameInternational Conference on Control, Automation and Systems
Volume2021-October
ISSN (Print)1598-7833

Conference

Conference21st International Conference on Control, Automation and Systems, ICCAS 2021
Country/TerritoryKorea, Republic of
CityJeju
Period12/10/2115/10/21

Keywords

  • 3D CNN
  • Machine Vision
  • RNN
  • Video Captioning
  • Visual Recognition

Fingerprint

Dive into the research topics of 'Describing Environmental Information in Videos Using Machine Learning'. Together they form a unique fingerprint.

Cite this