TY - JOUR
T1 - A Brief Survey into the Field of Automatic Image Dataset Generation through Web Scraping and Query Expansion
AU - Dikmans, Bart
AU - Kang, Dongwann
N1 - Publisher Copyright:
© 2023 KIPS
PY - 2023
Y1 - 2023
N2 - High-quality image datasets are in high demand for various applications. With many online sources providing manually collected datasets, a persisting challenge is to fully automate the dataset collection process. In this study, we surveyed an automatic image dataset generation field through analyzing a collection of existing studies. Moreover, we examined fields that are closely related to automated dataset generation, such as query expansion, web scraping, and dataset quality. We assess how both noise and regional search engine differences can be addressed using an automated search query expansion focused on hypernyms, allowing for user-specific manual query expansion. Combining these aspects provides an outline of how a modern web scraping application can produce large-scale image datasets.
AB - High-quality image datasets are in high demand for various applications. With many online sources providing manually collected datasets, a persisting challenge is to fully automate the dataset collection process. In this study, we surveyed an automatic image dataset generation field through analyzing a collection of existing studies. Moreover, we examined fields that are closely related to automated dataset generation, such as query expansion, web scraping, and dataset quality. We assess how both noise and regional search engine differences can be addressed using an automated search query expansion focused on hypernyms, allowing for user-specific manual query expansion. Combining these aspects provides an outline of how a modern web scraping application can produce large-scale image datasets.
KW - Image Dataset Generation
KW - Query Expansion
KW - Web Scraping
UR - http://www.scopus.com/inward/record.url?scp=85183472642&partnerID=8YFLogxK
U2 - 10.3745/JIPS.04.0288
DO - 10.3745/JIPS.04.0288
M3 - Article
AN - SCOPUS:85183472642
SN - 1976-913X
VL - 19
SP - 602
EP - 613
JO - Journal of Information Processing Systems
JF - Journal of Information Processing Systems
IS - 4
ER -