Semantic Similarity-based Visual Reasoning without Language Information

Changsu Choi, Hyeonseok Lim, Hayoung Jang, Juhan Park, Eunkyung Kim, Kyungtae Lim

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

In this research, we propose new training data for the visual reasoning task based on semantic similarity and proposed a deep learning model that utilizes the data. The first contribution of this study is the construction of training data. Based on a total of 40 object attributes, we created a visual inference problem using only image data. As a result, a total of 6,000 datasets were built to create training and test data. We also propose a visual inference model as the second contribution of this work. The inference model shown in this study was evaluated for two tasks using ResNet50 and Vision Transformer, respectively. Based on the experimental evaluation results, we investigated the suitable pre-trained model for both single-choice binary reasoning and multiple-selection reasoning, respectively.

Original languageEnglish
Title of host publication5th International Conference on Artificial Intelligence in Information and Communication, ICAIIC 2023
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages107-111
Number of pages5
ISBN (Electronic)9781665456456
DOIs
StatePublished - 2023
Event5th International Conference on Artificial Intelligence in Information and Communication, ICAIIC 2023 - Virtual, Online, Indonesia
Duration: 20 Feb 202323 Feb 2023

Publication series

Name5th International Conference on Artificial Intelligence in Information and Communication, ICAIIC 2023

Conference

Conference5th International Conference on Artificial Intelligence in Information and Communication, ICAIIC 2023
Country/TerritoryIndonesia
CityVirtual, Online
Period20/02/2323/02/23

Keywords

  • Deep Learning
  • Image similarity
  • Inference
  • Visual Reasoning

Fingerprint

Dive into the research topics of 'Semantic Similarity-based Visual Reasoning without Language Information'. Together they form a unique fingerprint.

Cite this