A Brief Survey into the Field of Automatic Image Dataset Generation through Web Scraping and Query Expansion

Bart Dikmans, Dongwann Kang

Research output: Contribution to journalArticlepeer-review

Abstract

High-quality image datasets are in high demand for various applications. With many online sources providing manually collected datasets, a persisting challenge is to fully automate the dataset collection process. In this study, we surveyed an automatic image dataset generation field through analyzing a collection of existing studies. Moreover, we examined fields that are closely related to automated dataset generation, such as query expansion, web scraping, and dataset quality. We assess how both noise and regional search engine differences can be addressed using an automated search query expansion focused on hypernyms, allowing for user-specific manual query expansion. Combining these aspects provides an outline of how a modern web scraping application can produce large-scale image datasets.

Original languageEnglish
Pages (from-to)602-613
Number of pages12
JournalJournal of Information Processing Systems
Volume19
Issue number4
DOIs
StatePublished - 2023

Keywords

  • Image Dataset Generation
  • Query Expansion
  • Web Scraping

Fingerprint

Dive into the research topics of 'A Brief Survey into the Field of Automatic Image Dataset Generation through Web Scraping and Query Expansion'. Together they form a unique fingerprint.

Cite this