TY - JOUR
T1 - Navigating the AI technology landscape from GitHub data
AU - Choi, Jaemyoung
AU - Lee, Sungsoo
AU - Lee, Hakyeon
N1 - Publisher Copyright:
© 2025 Elsevier Ltd
PY - 2026/3
Y1 - 2026/3
N2 - As artificial intelligence (AI) is considered a pivotal technology determining competitiveness, understanding the current and future state of AI technology has become crucial. Conventional approaches to mapping the technology landscape have relied heavily on patent data, but patents cannot adequately capture the state of the art in rapidly changing technologies like AI, due to significant time lags from development to registration. Given that much of the AI technology is developed through open source projects on GitHub, the largest and most popular code host and social coding platform, GitHub emerges as a promising data source for navigating the AI technology landscape. This study aims to explore and predict the AI landscape based on GitHub data. We propose a new bibliometric-like measure, called library coupling, which leverages the unique aspect of code reuse in open source software development to capture the relationships between GitHub repositories. A total of 2879 AI-related repositories with Python-based libraries were collected from GitHub. An AI repository network is constructed based on library coupling relationships among these repositories. Using the attributed graph clustering technique, the AI repositories within the network are grouped into 20 AI technology clusters. Subsequently, we employ graph convolutional network-based link prediction to predict the changes in the AI technology landscape. The proposed GitHub-based technology landscaping approach can be effectively utilized to grasp the current state of rapidly evolving AI technologies and predict their future trends, thereby supporting informed decision making in national AI policy formulation and corporate AI strategy.
AB - As artificial intelligence (AI) is considered a pivotal technology determining competitiveness, understanding the current and future state of AI technology has become crucial. Conventional approaches to mapping the technology landscape have relied heavily on patent data, but patents cannot adequately capture the state of the art in rapidly changing technologies like AI, due to significant time lags from development to registration. Given that much of the AI technology is developed through open source projects on GitHub, the largest and most popular code host and social coding platform, GitHub emerges as a promising data source for navigating the AI technology landscape. This study aims to explore and predict the AI landscape based on GitHub data. We propose a new bibliometric-like measure, called library coupling, which leverages the unique aspect of code reuse in open source software development to capture the relationships between GitHub repositories. A total of 2879 AI-related repositories with Python-based libraries were collected from GitHub. An AI repository network is constructed based on library coupling relationships among these repositories. Using the attributed graph clustering technique, the AI repositories within the network are grouped into 20 AI technology clusters. Subsequently, we employ graph convolutional network-based link prediction to predict the changes in the AI technology landscape. The proposed GitHub-based technology landscaping approach can be effectively utilized to grasp the current state of rapidly evolving AI technologies and predict their future trends, thereby supporting informed decision making in national AI policy formulation and corporate AI strategy.
KW - Artificial intelligence (AI)
KW - GitHub
KW - Library coupling
KW - Link prediction
KW - Open source
KW - Technology landscape
UR - https://www.scopus.com/pages/publications/105018124733
U2 - 10.1016/j.techsoc.2025.103090
DO - 10.1016/j.techsoc.2025.103090
M3 - Article
AN - SCOPUS:105018124733
SN - 0160-791X
VL - 84
JO - Technology in Society
JF - Technology in Society
M1 - 103090
ER -