TY - JOUR
T1 - Framing the Sequence
T2 - Genre-Aligned Photo Curation via Shot-Scale Embedding
AU - Park, Youngsup
AU - Lim, Yangmi
AU - Kang, Dongwann
N1 - Publisher Copyright:
© 2025 by the authors.
PY - 2025/9
Y1 - 2025/9
N2 - This paper presents a lightweight, genre-conditioned photo curation framework that restructures user-selected image sequences based on cinematic shot scale patterns. Unlike prior frame-level approaches, our method explicitly models sequential rhythm and genre style. The proposed pipeline integrates (1) a MobileNetV3-based shot scale classifier optimized for on-device efficiency, (2) a conditional variational autoencoder (cVAE) for embedding temporal shot rhythms conditioned on genre, and (3) a similarity-driven adaptation module that adjusts sequences through swap and crop operations guided by latent distance reduction. Deployed as an iOS application, the system processes an 8-image sequence in ~2.02 s with a footprint under 3 MB. Quantitative evaluations show that the classifier achieved 69.9% Top-1 accuracy (F1 = 0.646), and that adaptation reduced latent distance by 22.7% compared to shuffled baselines. On-device tests confirmed practical feasibility. A user study (n = 24) using Likert ratings revealed that the method improved rhythm perception among film/media experts, though effects on genre recognition and preference were less consistent for general users. Overall, this work contributes a novel, style-aware, and mobile-ready sequencing framework that advances beyond prior frame-level methods and supports applications in memory curation, interactive storytelling, and mobile authoring.
AB - This paper presents a lightweight, genre-conditioned photo curation framework that restructures user-selected image sequences based on cinematic shot scale patterns. Unlike prior frame-level approaches, our method explicitly models sequential rhythm and genre style. The proposed pipeline integrates (1) a MobileNetV3-based shot scale classifier optimized for on-device efficiency, (2) a conditional variational autoencoder (cVAE) for embedding temporal shot rhythms conditioned on genre, and (3) a similarity-driven adaptation module that adjusts sequences through swap and crop operations guided by latent distance reduction. Deployed as an iOS application, the system processes an 8-image sequence in ~2.02 s with a footprint under 3 MB. Quantitative evaluations show that the classifier achieved 69.9% Top-1 accuracy (F1 = 0.646), and that adaptation reduced latent distance by 22.7% compared to shuffled baselines. On-device tests confirmed practical feasibility. A user study (n = 24) using Likert ratings revealed that the method improved rhythm perception among film/media experts, though effects on genre recognition and preference were less consistent for general users. Overall, this work contributes a novel, style-aware, and mobile-ready sequencing framework that advances beyond prior frame-level methods and supports applications in memory curation, interactive storytelling, and mobile authoring.
KW - genre conditioning
KW - on-device inference
KW - photo curation
KW - sequence embedding
KW - shot scale
UR - https://www.scopus.com/pages/publications/105015786073
U2 - 10.3390/electronics14173434
DO - 10.3390/electronics14173434
M3 - Article
AN - SCOPUS:105015786073
SN - 2079-9292
VL - 14
JO - Electronics (Switzerland)
JF - Electronics (Switzerland)
IS - 17
M1 - 3434
ER -