Two-Stage Machine Learning for Urban PV Generation Forecasting: Model Development and SHAP-Based Interpretation of Key Climatic Drivers

Research output: Contribution to journalArticlepeer-review

Abstract

Urban photovoltaic (PV) generation forecasting is crucial for energy efficiency and grid stability. This study proposes a two-stage machine learning (ML) framework that extends the concept of a hurdle model by utilizing climatic and temporal variables and quantitatively analyzes variable contributions through SHAP-based interpretation. The proposed framework integrates a probabilistic classifier that identifies PV generation status in the first stage with a nonlinear conditional regression model in the second stage, thereby forming an adaptive boundary between generation and non-generation intervals. A total of 320 model combinations were evaluated in four urban buildings and under five seasonal conditions. The results showed that the two-stage conditional regression model exhibited superior generalization performance compared to a single-stage regression model. Specifically, the multi-layer perceptron (MLP)-based conditional regression model recorded a test R2 value of 0.7 or above (R2 ≥ 0.7) under most conditions. Also, in SHAP analysis, direct solar irradiance (DSI), diffuse solar irradiance (DiffSI), hour of the day (Hour), visibility (Vis), and relative humidity (RH) were derived as common key drivers of fluctuations in PV generation, and the cumulative contribution of these five drivers was more than 80%. Furthermore, the interpretive consistency and transferability of SHAP results were quantitatively verified using Kendall’s τ and Top-k overlap indicators across buildings and seasons, demonstrating the robustness and reproducibility of the proposed framework. These results demonstrate that solar irradiance (i.e., DSI and DIffSI) and temporal patterns (i.e., Hour) are key drivers of urban PV generation fluctuations, offering practical application values in terms of real-time operation optimization, policy design, and scalability to diverse regions and buildings.

Original languageEnglish
Article number5
JournalInternational Journal of Thermophysics
Volume47
Issue number1
DOIs
StatePublished - Jan 2026

UN SDGs

This output contributes to the following UN Sustainable Development Goals (SDGs)

  1. SDG 7 - Affordable and Clean Energy
    SDG 7 Affordable and Clean Energy

Keywords

  • Conditional regression
  • Model interpretability
  • SHapley Additive exPlanations (SHAP)
  • Solar irradiance and temporal patterns
  • Two-stage machine learning
  • Urban photovoltaic (PV) forecasting

Fingerprint

Dive into the research topics of 'Two-Stage Machine Learning for Urban PV Generation Forecasting: Model Development and SHAP-Based Interpretation of Key Climatic Drivers'. Together they form a unique fingerprint.

Cite this