TY - GEN
T1 - FAIR-SE
T2 - 34th ACM International Conference on Information and Knowledge Management, CIKM 2025
AU - You, Jaebeom
AU - Hong, Seung Kyu
AU - Liu, Ling
AU - Lee, Kisung
AU - Kwon, Hyuk Yoon
N1 - Publisher Copyright:
© 2025 Copyright held by the owner/author(s).
PY - 2025/11/10
Y1 - 2025/11/10
N2 - Search engine personalization, while enhancing user satisfaction, can lead to information disparities. Previous studies on this topic face limitations, such as the absence of context-aware data collection, superficial URL-level analysis, and human-dependent annotations. We propose FAIR-SE, a Framework for Analyzing Information dispaRities in Search Engines that addresses these challenges through AWS Lambda-based concurrent data collection and LLM-generated persona-based content analysis. We collected search results across four user contexts (Search History, Geo-location, Language Preference, and Access Environment) and analyzed them through four analytical perspectives (Political Leaning, Topic-specific Stance, Subjectivity, and Bias). Experiments conducted on two globally prominent search engines across nine controversial topics demonstrate the efficacy of FAIR-SE regarding benchmark accuracy, persona consistency, and ability to reflect real-world discourse patterns across diverse topics. Our statistical analysis identifies distinct search engine characteristics and demonstrates significant information disparities in our case studies examining regional disparities in search results. Our code and datasets are publicly available at: https://github.com/bigbases/FAIR-SE.
AB - Search engine personalization, while enhancing user satisfaction, can lead to information disparities. Previous studies on this topic face limitations, such as the absence of context-aware data collection, superficial URL-level analysis, and human-dependent annotations. We propose FAIR-SE, a Framework for Analyzing Information dispaRities in Search Engines that addresses these challenges through AWS Lambda-based concurrent data collection and LLM-generated persona-based content analysis. We collected search results across four user contexts (Search History, Geo-location, Language Preference, and Access Environment) and analyzed them through four analytical perspectives (Political Leaning, Topic-specific Stance, Subjectivity, and Bias). Experiments conducted on two globally prominent search engines across nine controversial topics demonstrate the efficacy of FAIR-SE regarding benchmark accuracy, persona consistency, and ability to reflect real-world discourse patterns across diverse topics. Our statistical analysis identifies distinct search engine characteristics and demonstrates significant information disparities in our case studies examining regional disparities in search results. Our code and datasets are publicly available at: https://github.com/bigbases/FAIR-SE.
KW - context-aware data scraping
KW - llm-generated persona
KW - search engines
KW - statistical significance testing
UR - https://www.scopus.com/pages/publications/105023199331
U2 - 10.1145/3746252.3761361
DO - 10.1145/3746252.3761361
M3 - Conference contribution
AN - SCOPUS:105023199331
T3 - CIKM 2025 - Proceedings of the 34th ACM International Conference on Information and Knowledge Management
SP - 3920
EP - 3930
BT - CIKM 2025 - Proceedings of the 34th ACM International Conference on Information and Knowledge Management
PB - Association for Computing Machinery, Inc
Y2 - 10 November 2025 through 14 November 2025
ER -