Author
Affiliation

Beniamino Sartini

University of Bologna

Published

June 11, 2026

Modified

June 11, 2026

1 Renewable power generation and weather conditions

R packages
library(tidyverse)
library(backports)
library(latex2exp)

2 Project description

The dataset contains renewable power generation and weather conditions. The original dataset can be downloaded from Kaggle. The variable Energy delta[Wh] is not a power variable: it is an energy increment measured in Watt-hours between two consecutive timestamps. Since the raw frequency is 15 minutes, each raw observation describes the energy accumulated over the previous 15-minute interval. We denote this raw increment by \(R_q^{15m}\).

The objective of the project is to measure how renewable weather conditions, especially solar irradiation, affect the hourly energy delta. For this reason the analysis is performed on hourly data and only during daylight. Keeping night observations would mostly add structural zeros: when the sun is not in the sky, solar irradiation is zero and the energy delta is not informative about the marginal impact of renewable conditions.

The main unit issue is that GHI is measured in \(W/m^2\), i.e. power per square meter, while Energy delta[Wh] is measured in Wh, i.e. energy. To make them compatible we aggregate the data by hour. The hourly energy delta is obtained by summing Wh over the four 15-minute intervals: \[ R_h = \sum_{q \in h} R_q^{15m}. \] Instead, solar irradiation is converted into hourly solar exposure by multiplying each GHI observation by the interval length \(\Delta t = 0.25\) hours and summing inside the hour: \[ X_h = \sum_{q \in h} GHI_q \cdot 0.25. \] Hence, \(X_h\) is measured in \(Wh/m^2\) and can be compared with an hourly energy outcome. The comparison is still not one-to-one because the panel area and conversion efficiency are not observed, but the units are coherent: both variables are energy over an hourly interval.

Tip 1: Description of the variables
  • Time: datetime of the observation.
  • Energy delta[Wh]: energy increment in Watt-hours (Wh) from the previous timestamp to the current timestamp, denoted by \(R_q^{15m}\) in the raw 15-minute data.
  • GHI: Global Horizontal Irradiance in Watts per square meter (\(W/m^2\)) measured by a pyranometer.
  • temp: The temperature in degrees Celsius (\(^{\circ}\text{C}\)) measured at the same height as the pyranometer.
  • pressure: The atmospheric pressure in hectopascals (hPa) measured at the same height as the pyranometer.
  • humidity: The relative humidity in percentage (%) measured at the same height as the pyranometer.
  • wind_speed: The wind speed in meters per second (m/s) measured at the same height as the pyranometer.
  • rain_1h: The amount of precipitation in millimeters (mm) measured over the past hour.
  • snow_1h: The amount of snowfall in millimeters (mm) measured over the past hour.
  • clouds_all: cloud cover in percentage.
  • isSun: Indicator equal to 1 when the observation is during sunlight time.
  • sunlightTime: Elapsed sunlight time.
  • dayLength: Length of the day.
  • SunlightTime/daylength: Ratio between elapsed sunlight time and the full day length.
  • weather_type: categorical weather condition.
  • hour: Hour of the day.
  • month: Month of the year.
Import data
dir_data <- "../data"
filepath <- file.path(dir_data, "Renewable.csv")
data_15min <- readr::read_csv(filepath, show_col_types = FALSE, progress = FALSE) %>%
  dplyr::rename(energy_delta = `Energy delta[Wh]`,
                sunlight_ratio = `SunlightTime/daylength`) %>%
  dplyr::mutate(date = as.POSIXct(Time, tz = "UTC"),
                hour_stamp = lubridate::floor_date(date, unit = "hour"),
                date_day = as.Date(date),
                Year = lubridate::year(date),
                Month = month,
                Month_ = lubridate::month(date, label = TRUE),
                Day = lubridate::day(date),
                Hour = hour,
                Weekday = weekdays(date),
                energy_delta_Wh_15min = energy_delta,
                solar_exposure_Wh_m2 = GHI*0.25,
                weather_type = as.factor(weather_type))

data <- data_15min %>%
  dplyr::filter(isSun == 1) %>%
  dplyr::group_by(hour_stamp) %>%
  dplyr::summarise(date = min(hour_stamp),
                   date_day = as.Date(first(hour_stamp)),
                   Year = lubridate::year(first(hour_stamp)),
                   Month = lubridate::month(first(hour_stamp)),
                   Month_ = lubridate::month(first(hour_stamp), label = TRUE),
                   Day = lubridate::day(first(hour_stamp)),
                   Hour = lubridate::hour(first(hour_stamp)),
                   Weekday = weekdays(first(hour_stamp)),
                   energy_delta_Wh = sum(energy_delta_Wh_15min),
                   energy_delta_kWh = energy_delta_Wh/1000,
                   solar_exposure_Wh_m2 = sum(solar_exposure_Wh_m2),
                   GHI_W_m2 = mean(GHI),
                   temp = mean(temp),
                   pressure = mean(pressure),
                   humidity = mean(humidity),
                   wind_speed = mean(wind_speed),
                   rain_1h = max(rain_1h),
                   snow_1h = max(snow_1h),
                   clouds_all = mean(clouds_all),
                   sunlightTime = max(sunlightTime),
                   dayLength = max(dayLength),
                   sunlight_ratio = mean(sunlight_ratio),
                   weather_type = names(sort(table(weather_type), decreasing = TRUE))[1],
                   n_15min = n(),
                   .groups = "drop") %>%
  dplyr::mutate(log_delta = log(energy_delta_Wh + 1),
                log_exposure = log(solar_exposure_Wh_m2 + 1),
                weather_type = as.factor(weather_type))

head(data_15min) %>%
  dplyr::select(Time, energy_delta, GHI, temp, pressure, humidity,
                wind_speed, clouds_all, isSun, sunlight_ratio, hour, month) %>%
  knitr::kable(booktabs = TRUE ,escape = FALSE, align = 'c')%>%
  kableExtra::row_spec(0, color = "white", background = "green")
Time energy_delta GHI temp pressure humidity wind_speed clouds_all isSun sunlight_ratio hour month
2017-01-01 00:00:00 0 0 1.6 1021 100 4.9 100 0 0 0 1
2017-01-01 00:15:00 0 0 1.6 1021 100 4.9 100 0 0 0 1
2017-01-01 00:30:00 0 0 1.6 1021 100 4.9 100 0 0 0 1
2017-01-01 00:45:00 0 0 1.6 1021 100 4.9 100 0 0 0 1
2017-01-01 01:00:00 0 0 1.7 1020 100 5.2 100 0 0 1 1
2017-01-01 01:15:00 0 0 1.7 1020 100 5.2 100 0 0 1 1
Hourly aggregation check
df_import <- tibble::tibble(
  raw_15min_observations = nrow(data_15min),
  daylight_15min_observations = sum(data_15min$isSun == 1),
  removed_night_observations = sum(data_15min$isSun == 0),
  hourly_daylight_observations = nrow(data),
  first_hour = min(data$date),
  last_hour = max(data$date)
)

df_import %>%
  knitr::kable(booktabs = TRUE, escape = FALSE, align = 'c') %>%
  kableExtra::row_spec(0, color = "white", background = "green")
raw_15min_observations daylight_15min_observations removed_night_observations hourly_daylight_observations first_hour last_hour
196776 102316 94460 27132 2017-01-01 07:00:00 2022-08-31 17:00:00
Table 1: Data filtering and hourly aggregation.
Unit compatibility
df_units <- tibble::tribble(
  ~Variable, ~Raw_unit, ~Hourly_transformation, ~Hourly_unit, ~Role,
  "Energy delta", "Wh per 15 minutes", "Sum inside each daylight hour", "Wh per hour", "Dependent variable",
  "GHI", "W/m^2", "Sum GHI x 0.25 hours inside each daylight hour", "Wh/m^2 per hour", "Main renewable regressor",
  "Temperature", "C", "Average inside the hour", "C", "Weather control",
  "Pressure", "hPa", "Average inside the hour", "hPa", "Weather control",
  "Humidity", "%", "Average inside the hour", "%", "Weather control",
  "Wind speed", "m/s", "Average inside the hour", "m/s", "Weather control",
  "Rain and snow", "mm over the previous hour", "Maximum value inside the hour", "mm", "Weather control",
  "Cloud cover", "%", "Average inside the hour", "%", "Weather control"
)

df_units %>%
  knitr::kable(booktabs = TRUE, escape = FALSE, align = 'c') %>%
  kableExtra::row_spec(0, color = "white", background = "green")
Variable Raw_unit Hourly_transformation Hourly_unit Role
Energy delta Wh per 15 minutes Sum inside each daylight hour Wh per hour Dependent variable
GHI W/m^2 Sum GHI x 0.25 hours inside each daylight hour Wh/m^2 per hour Main renewable regressor
Temperature C Average inside the hour C Weather control
Pressure hPa Average inside the hour hPa Weather control
Humidity % Average inside the hour % Weather control
Wind speed m/s Average inside the hour m/s Weather control
Rain and snow mm over the previous hour Maximum value inside the hour mm Weather control
Cloud cover % Average inside the hour % Weather control
Table 2: Unit compatibility used in the project.

3 Part A: Descriptive analysis

Consider the hourly daylight energy delta \(R_h\) and the weather variables. The first objective is to understand when the energy delta is positive, how it changes during daylight hours and how much it depends on solar exposure and cloud conditions.

3.1 Task A.1

Compute the main descriptive statistics of the hourly daylight energy delta \(R_h\), i.e. total energy in kWh, mean, median, maximum, standard deviation, the percentage of daylight hours with zero energy and the percentage of daylight hours with positive energy. Then plot the empirical distribution of positive hourly energy delta. Is the distribution symmetric? Comment the result (max 150 words).

3.2 Task A.2

Aggregate \(R_h\) by month and by hour. Compute the total monthly energy delta, the monthly average hourly energy delta, the monthly average solar exposure and the percentage of positive daylight hours. Then compute the average energy delta for each pair month-hour. In which month is the total energy delta maximum? At which hour is the average energy delta maximum? Plot the monthly hourly profiles.

3.3 Task A.3

Group the data by cloud cover and daylight position. Use four cloud-cover groups: 0-25%, 25-50%, 50-75% and 75-100%. Use three daylight-position groups: morning, central day and afternoon, defined from the ratio between elapsed sunlight time and day length. Compute the mean energy delta, the median energy delta and the percentage of positive observations for each group. Which condition produces the highest average energy delta? Which condition produces the lowest one? Comment the result (max 150 words).

4 Part B: Renewable generation and weather

We now focus on the statistical link between renewable conditions and hourly energy delta. The variable solar_exposure_Wh_m2 is used as the main renewable intensity variable because it converts GHI from power density into hourly solar energy density. The objective is to understand how much of the variation in \(R_h\) is explained by solar exposure, weather and daylight conditions.

4.1 Task B.1

Compute the correlation between \(R_h\) and each weather variable: solar_exposure_Wh_m2, GHI_W_m2, temp, pressure, humidity, wind_speed, rain_1h, snow_1h, clouds_all, sunlight_ratio and n_15min. Rank the variables by absolute correlation. Which variable is most associated with energy delta? Plot \(R_h\) against hourly solar exposure.

4.2 Task B.2

Fit a linear model for \(\log(R_h + 1)\) using the weather variables and seasonal controls for hour and month: \[ \log(R_h + 1) = \beta_0 + \beta_1 x_h + \beta_2 x_h^2 + \boldsymbol{\gamma}'W_h + \text{Hour}_h + \text{Month}_h + \varepsilon_h \text{,} \] where \(x_h = \log(X_h + 1)\) and \(X_h\) is hourly solar exposure in \(Wh/m^2\). The vector \(W_h\) contains temperature, pressure, humidity, wind speed, rain, snow, cloud cover, daylight ratio and the number of 15-minute daylight observations inside the hour. Estimate the model on 80% of the data and compute the root mean squared error on the remaining 20%. Which variables have the expected sign?

5 Task C

Let’s say that we want to understand the impact of renewable conditions on hourly energy delta. In practice we ask ourselves the following question:

“What happens to the hourly energy delta if solar exposure increases by 1%, holding weather and seasonal controls fixed?”

To capture this impact, we estimate a model with interactions between log solar exposure and the main weather variables: \[ \log(R_h + 1) = \beta_0 + \beta_1 x_h + \beta_2 x_h^2 + \gamma_1 x_h L_h + \gamma_2 x_h C_h + \gamma_3 x_h T_h + \boldsymbol{\delta}'Z_h + \varepsilon_h \text{,} \] where \(x_h=\log(X_h+1)\), \(X_h\) is hourly solar exposure in \(Wh/m^2\), \(L_h\) is the daylight ratio, \(C_h\) is cloud cover, \(T_h\) is temperature and \(Z_h\) contains the remaining controls. Taking the derivative with respect to \(x_h\) we obtain: \[ \partial_{x_h}\log(R_h + 1) = \beta_1 + 2\beta_2x_h + \gamma_1 L_h + \gamma_2 C_h + \gamma_3 T_h \text{.} \tag{1}\]

5.1 Task C.1

Estimate the interaction model above. Report the model statistics and the coefficients directly linked with the renewable impact, i.e. log_exposure, log_exposure^2, log_exposure:sunlight_ratio, log_exposure:clouds_all and log_exposure:temp. Are the interactions economically reasonable?

5.2 Task C.2

Using Equation 1, define a function that computes the marginal effect of log solar exposure on \(\log(R_h+1)\). Then compute the expected percentage change in energy delta after a 10% increase in hourly solar exposure under three situations:

  1. Clear day: high daylight ratio and low cloud cover.
  2. Mixed day: intermediate daylight ratio and intermediate cloud cover.
  3. Cloudy day: low daylight ratio and high cloud cover.

Are renewable shocks equally productive in percentage terms?

5.3 Task C.3

Simulate the expected hourly energy delta under 4 renewable-weather scenarios and 4 temperature levels. Consider a representative observation at 12:00 in June of the last available year. Use the estimated model to compute the expected energy delta in Wh.

  1. Current average: average daylight observations in the data.
  2. Moderate renewable gain: 10% higher hourly solar exposure, slightly higher daylight ratio and lower cloud cover.
  3. Strong renewable gain: 25% higher hourly solar exposure, higher daylight ratio and much lower cloud cover.
  4. Cloud constrained: 10% lower hourly solar exposure and higher cloud cover.

What is the expected effect of better renewable conditions on energy delta?

Back to top

Citation

BibTeX citation:
@online{sartini2026,
  author = {Sartini, Beniamino},
  date = {2026-06-11},
  url = {https://greenfin.it/projects/project-2026-B.html},
  langid = {en}
}
For attribution, please cite this work as:
Sartini, Beniamino. 2026. June 11, 2026. https://greenfin.it/projects/project-2026-B.html.