Author
Affiliation

Beniamino Sartini

University of Bologna

Published

June 11, 2026

Modified

June 11, 2026

1 Hourly energy demand generation and weather

R packages
library(tidyverse)
library(backports)
library(solarr)
library(latex2exp)

2 Project description

The dataset contains 4 years of electrical consumption, generation, pricing, and weather data for Spain. Consumption and generation data was retrieved from ENTSOE a public portal for Transmission Service Operator (TSO) data. Settlement prices were obtained from the Spanish TSO Red Electric España. Weather data was purchased as part of a personal project from the Open Weather API for the 5 largest cities in Spain. The original datasets can be downloaded from Kaggle. However, the dataset to be used for the project has been already pre-processed. The code used for preprocessing is in this github repository, while the dataset to be used can be downloaded here.

Tip 1: Description of the variables
  • date: datetime index localized to CET.
  • date_day: date without hour.
  • city: reference city. (Valencia. Madrid, Bilbao,Barcelona, Seville)
  • Year: number for the year.
  • Month: number for the month.
  • Day: number for the day in a month.
  • Hour: reference hour of the day.

Energy variables

  • fossil: energy produced with fossil sources: i.e. coal/lignite, natural gas, oil (MW).
  • nuclear: energy produced with nuclear sources (MW).
  • energy_other: other generation (MW).
  • solar: solar generation (MW).
  • biomass: biomass generation (MW).
  • river: hydroeletric generation (MW).
  • wind: wind generation (MW).
  • energy_other_renewable: other renewable generation (MW).
  • energy_waste: wasted generation (MW).
  • pred_solar: forecasted solar generation (MW).
  • pred_demand: forecasted electrical demand by TSO (MW).
  • pred_price: forecasted day ahead electrical price by TSO (Eur/MWh).
  • price: actual electrical price (Eur/MWh).
  • demand: actual electrical demand (Eur/MWh).
  • prod_tot: total electrical production (MW).
  • prod_renewable: total renewable electrical production (MW).

Weather variables

  • temp: mean temperature.
  • temp_min: minimum temperature (degrees).
  • temp_max: maximum temperature (degrees).
  • pressure: pressure in (hPas).
  • humidity: humidity in percentage.
  • wind_speed: wind speed (m/s).
  • wind_deg: wind direction.
  • rain_1h: rain in last hour (mm).
  • rain_3h: rain in last 3 hours (mm).
  • snow_3h: show last 3 hours (mm).
  • clouds_all: cloud cover in percentage.
Import data
dir_data <- "../data"
filepath <- file.path(dir_data, "data.csv")
data <- readr::read_csv(filepath, show_col_types = FALSE, progress = FALSE)
head(data) %>%
  knitr::kable(booktabs = TRUE ,escape = FALSE, align = 'c')%>%
  kableExtra::row_spec(0, color = "white", background = "green")
date date_day Year Month Day Hour Season fossil nuclear energy_other solar biomass river energy_other_renewable energy_waste wind pred_solar pred_demand pred_price demand price prod prod_renewable temp temp_min temp_max pressure humidity wind_speed wind_deg rain_1h rain_3h snow_3h clouds_all
2014-12-31 23:00:00 2014-12-31 2014 12 31 23 Winter 10156 7096 43 49 447 3813 73 196 6378 17 26118 50.10 25385 65.41 27939 10687 -0.6585375 -5.825 8.475 1016.4 82.4 2.0 135.2 0 0 0 0
2015-01-01 00:00:00 2015-01-01 2015 1 1 0 Winter 10437 7096 43 50 449 3587 71 195 5890 16 24934 48.10 24382 64.92 27509 9976 -0.6373000 -5.825 8.475 1016.2 82.4 2.0 135.8 0 0 0 0
2015-01-01 01:00:00 2015-01-01 2015 1 1 1 Winter 9918 7099 43 50 448 3508 73 196 5461 8 23515 47.33 22734 64.48 26484 9467 -1.0508625 -6.964 8.136 1016.8 82.0 2.4 119.0 0 0 0 0
2015-01-01 02:00:00 2015-01-01 2015 1 1 2 Winter 8859 7098 43 50 438 3231 75 191 5238 2 22642 42.27 21286 59.32 24914 8957 -1.0605312 -6.964 8.136 1016.6 82.0 2.4 119.2 0 0 0 0
2015-01-01 03:00:00 2015-01-01 2015 1 1 3 Winter 8313 7097 43 42 428 3499 74 189 4935 9 21785 38.41 20264 56.04 24314 8904 -1.0041000 -6.964 8.136 1016.6 82.0 2.4 118.4 0 0 0 0
2015-01-01 04:00:00 2015-01-01 2015 1 1 4 Winter 7962 7098 43 34 410 3804 74 188 4618 4 21441 35.72 19905 53.63 23926 8866 -1.1260000 -7.708 7.317 1017.4 82.6 2.4 174.8 0 0 0 0

3 Part A: Descriptive analysis

Consider the data of the energy market in Spain and let’s examine the composition of the energy mix. At your disposal you have 6 sources that generate energy, i.e. 4 renewable (columns biomass, solar, river, wind) and 2 non-renewable (columns nuclear, fossil). The column demand represents the amount of demanded electricity in MW for a given day, denoted as \(D_t\). Let’s denote the energy of the \(\ast\)-th source as \[ E^{\boldsymbol{\ast}}_{t} = \begin{cases} E^{b}_{t} \quad \boldsymbol{\ast} \to b = \text{biomass} \\ E^{s}_{t} \quad \boldsymbol{\ast} \to s = \text{solar} \\ E^{h}_{t} \quad \boldsymbol{\ast} \to h = \text{hydroelettric} \\ E^{w}_{t} \quad \boldsymbol{\ast} \to w = \text{wind} \\ E^{n}_{t} \quad \boldsymbol{\ast} \to n = \text{nuclear} \\ E^{f}_{t} \quad \boldsymbol{\ast} \to f = \text{fossil} \\ E^{o}_{t} \quad \boldsymbol{\ast} \to o = \text{others} \\ \end{cases} \tag{1}\] where others is the sum of the column energy_other and energy_other_renewable.

3.1 Task A.1

Let’s decompose the total demand across all the sources by defining share of demand met by source \(\ast\), i.e.  \[ s^{\ast}_{t} = \frac{E^{\ast}_{t}}{D_t} \tag{2}\] where \(E^{\ast}_{t}\) is defined as in Equation 1. Then, let’s compute the average share of demand \(\bar{s}^{\ast}\) between the electricity produced from each source and the total electricity demanded from the market, i.e.  \[ \bar{s}^{\ast} = \frac{1}{n} \sum_{t = 1}^{n} s_{t}^{\ast} \tag{3}\] Then, rank the first 5 sources by the percentage of demand satisfied with respect to the total demand in decreasing order. Plot the results with an histogram. Which is the source of energy that, on average, satisfy most of the electricity demand?

3.2 Task A.2

Aggregate the sources of energy in two groups:

  1. Renewable: computed as the sum of solar, river, wind and biomass.
  2. Non-renewable: computed as the sum of nuclear and fossil.

Then, compute the percentage of energy demand satisfied by each group. What is the percentage of energy satisfied by renewable sources? And by the fossil ones?

Repeat the computations, but filtering the data firstly only for year 2015 and then only for 2018. Are the percentage obtained different? Describe the result and explain if the transition to renewable energy is supported by data (to be supported the percentage of renewable energy should increase while percentage of fossil fuels should decrease.)

3.3 Task A.3

For each month, compute the mean percentage of demand of energy satisfied by the fossil sources. In order to compute \(D^{{\boldsymbol{\ast}}}_m\) for the \(m\)-month, let’s denote with \(n_m\) the number of day in that month, then: \[ D^{{\boldsymbol{\ast}}}_m = \frac{1}{n_m} \sum_{t = 1}^{n_m} \frac{E^{\boldsymbol{\ast}}_{t}}{D_{t}} = \frac{1}{n_m} \sum_{t = 1}^{n_m} D^{{\boldsymbol{\ast}}}_t \tag{4}\]

Compute also the maximum, minimum, median and standard deviation of \(D^{{\boldsymbol{\ast}}}_t\) for each month. Repeat the computation for the others 4 sources of energy, i.e. nuclear, solar, river and wind. Which are the source with the lowest and the highest standard deviation in December? And in June? Comment the results obtained (max 200 words).

3.4 Task A.4

In mean, the demand satisfied by solar energy with respect to total demand is greater or lower with respect to the one satisfied by hydroelectric on 12:00 in March? In July, August and September at the same hour the situation changes? Answer, providing a possible explanation of the results obtained (max 200 words).

3.5 Task A.5

Grouping the data by season and hour (\(4 \times 24\) groups) and by month and hour (\(12 \times 24\) groups) and compute:

  1. Average electricity price.
  2. Average ratio between total production and total demand, i.e. prod/demand.

For both groups, answer to the following questions.

What is the meaning of a ratio between total production and total demand greater than 1? And greater than 1? Following the demand and supply law, what is the expected behavior of the price in such situations? Comment the results (max 200 words).

When the price reaches it’s minimum value? In such situation, does the ratio reach its maximum value? What happen when the prices reaches it’s maximum? Comment the results (max 200 words).

4 Part B: Model daily electricity price

4.1 Task B.1

Let’s aggregate the data on a daily basis by computing the daily average electricity prices.

Daily Aggregation
# Daily prices 
data_prices <- data %>%
  mutate(date = date_day) %>%
  group_by(date, Year, Month, Day) %>%
  summarise(price = mean(price), Season = unique(Season)) %>%
  ungroup() %>%
  mutate(log_price = log(price), 
         Weekday = weekdays(date))

# Prepare the seasonal variables 
# Day of the week 
levels_weekday <- c("Monday", "Tuesday", "Wednesday", "Thursday", "Friday", "Saturday", "Sunday")
data_prices$Weekday_ <- factor(data_prices$Weekday, levels = levels_weekday, labels = levels_weekday)
# Months
levels_month <- c("January", "February", "March", "April", 
                  "May", "June", "July", "August", 
                  "September", "October", "November", "December")
data_prices$Month_ <- factor(data_prices$Month, levels = 1:12, labels = levels_month)
# Seasons
levels_season <- c("Winter", "Spring", "Summer", "Autumn")
data_prices$Season_ <- factor(data_prices$Season, levels = levels_season, labels = levels_season)

4.1.1 Task B.1 (a)

Let’s consider only positive prices \(P_t\) for electricity, hence we transform them into the corresponding log-prices \(S_t\), i.e. \[ S_t = \log(P_t) \iff P_t = \exp(S_t) \text{.} \tag{5}\] Then, we can proceed by removing the seasonality from the time series. Let’s define a seasonal function to remove deterministic components from the month, day of the week, i.e.  \[ \bar{S}_t = s_0 + D(t) + M(t) + S(t) \text{,} \] where \(D(t)\), \(M(t)\) and \(S(t)\) stands for the indicator functions \[ \begin{aligned} D(t) = \begin{cases} D_2 \quad \ \text{Tuesday} \\ \vdots \\ D_{7} \quad \text{Sunday} \\ \end{cases} \quad M(t) = \begin{cases} M_2 \quad \ \text{February} \\ \vdots \\ M_{12} \quad \text{December} \\ \end{cases} \quad S(t) = \begin{cases} S_{2} \quad \ \text{Spring} \\ S_{3} \quad \ \text{Summer} \\ S_{4} \quad \text{Autumn} \\ \end{cases} \end{aligned} \tag{6}\] It follows that, by construction, the intercept \(s_0\) is interpreted as the average log price when the day is Monday, the month is January and the season is Spring. Estimate the parameters of the model with Ordinary Least Square, then compute the seasonal price \(\bar{S}_t\) and the deseasonalized time series, i.e.  \[ \tilde{S}_t = S_t - \bar{S}_t \text{.} \]

4.1.2 Task B.1 (b)

Plot the log-prices \(S_t\), the deseasonalized log-prices \(\tilde{S}_t\) and the estimated seasonal function \(\bar{S}_t\) for 2017.

4.1.3 Task B.1 (c)

Plot the autocorrelation function for the deseasonalized log-prices \(\tilde{S}_t\) and for the squared deseasonalized log-prices \(\tilde{S}_t^2\) and their confidence intervals with \(\alpha = 0.05\). Is the time series of \(\tilde{S}_t\) uncorrelated?

4.2 Task B.2

The deseasonalized log-prices presents a strong autocorrelation in mean and variance. Let’s proceed by removing the first and then we will deal with the second. A popular model for electricity prices is the Autoregressive model of order 1 (AR(1)), the correspondent discrete version of an OU-process.

4.2.1 Task B.2 (a)

Test the hypothesis that the deseasonalized log-prices follows a random walk with an appropriate test. For example, you can consider an Augmented Dickey-Fuller Test. Fit with OLS an AR(1) model of the form \[ \tilde{S}_t = \phi \tilde{S}_{t-1} + \varepsilon_t \text{.} \] where we refer to this section for more details. Then, compute the fitted residuals \[ \varepsilon_t = \tilde{S}_t - \phi \tilde{S}_{t-1} \text{,} \tag{7}\] their standard deviation \[ \sigma_{\varepsilon} = \sqrt{\frac{1}{n-1}\sum_{t = 1}^n \varepsilon_t^2} \text{,} \tag{8}\] and the standardized residuals \[ \tilde{\varepsilon}_t = \frac{\varepsilon_t}{\sigma_{\varepsilon}} \text{.} \tag{9}\] Is the estimated model stationary?

4.2.2 Task B.2 (b)

Plot the autocorrelation function of standardized residuals \(\tilde{\varepsilon}_t\), the squared standardized residuals \(\tilde{\varepsilon}_t^2\) and their confidence intervals with \(\alpha = 0.05\). Then, let’s verify if the basic assumptions of the model hold.

  • Assumption 1.: Identically distributed residuals. We can verify this hypothesis with a two-sample Kolmogorov test. (see here)

Hint: For the two-sample Kolmogorov test you can consider the following R function.

Two-sample Kolmogorov Smirnov Test
#' Kolmogorov Smirnov test for a distribution
#'
#' Test against a specific distribution
#'
#' @param x a vector.
#' @param ci p.value for rejection.
#' @param idx_split Index used for splitting the time series. If `missing` will be random sampled.
#' @param min_quantile minimum quantile for the grid of values.
#' @param max_quantile maximum quantile for the grid of values.
#' @param seed random seed for two sample test.
#' @param plot when `TRUE` a plot is returned, otherwise a `tibble`.
#' @return when `plot = TRUE` a plot is returned, otherwise a `tibble`.
ks_test_ts <- function(x, ci = 0.05, idx_split, min_quantile = 0.015, max_quantile = 0.985, seed = 1, plot = FALSE){
  # Random seed
  set.seed(seed)
  # number of observations
  n <- length(x)
  # Random split of the time series
  idx_split <- ifelse(missing(idx_split), sample(n, 1), idx_split)
  x1 <- x[1:idx_split]
  x2 <- x[(idx_split+1):n]
  # Number of elements for each sub sample
  n1 <- length(x1)
  n2 <- length(x2)
  # Grid of values for KS-statistic
  grid <- seq(quantile(x, min_quantile), quantile(x, max_quantile), 0.01)
  # Empiric cdfs
  cdf_n1 <- ecdf(x1)
  cdf_n2 <- ecdf(x2)
  # KS-statistic
  ks_stat <- max(abs(cdf_n1(grid) - cdf_n2(grid)))
  # Rejection level
  # Equivalent: sqrt(-log(ci / 2) * (1 + n2/n1) / (2*n2))
  rejection_lev <- sqrt(-0.5 * log(ci / 2)) * sqrt((n1+n2)/(n1*n2))
  # P-value
  p.value <- 2 * exp(- 2 * n2 / (1 + n1/n2) * ks_stat^2)

  # ========================== Plot ==========================
  if (plot) {
    y_breaks <- seq(0, 1, 0.2)
    y_labels <- paste0(format(y_breaks*100, digits = 2), "%")
    grid_max <- grid[which.max(abs(cdf_n1(grid) - cdf_n2(grid)))]
    plt <- ggplot()+
      geom_ribbon(aes(grid, ymax = cdf_n1(grid), ymin = cdf_n2(grid)),
                  alpha = 0.5, fill = "green") +
      geom_line(aes(grid, cdf_n1(grid)))+
      geom_line(aes(grid, cdf_n2(grid)), color = "red")+
      geom_segment(aes(x = grid_max, xend = grid_max,
                       y = cdf_n1(grid_max), yend = cdf_n2(grid_max)),
                   linetype = "solid", color = "magenta")+
      geom_point(aes(grid_max, cdf_n1(grid_max)), color = "magenta")+
      geom_point(aes(grid_max, cdf_n2(grid_max)), color = "magenta")+
      scale_y_continuous(breaks = y_breaks, labels = y_labels)+
      labs(x = "x", y = "cdf")+
      theme_bw()
    return(plt)
  } else {
    kab <- dplyr::tibble(
      idx_split = idx_split,
      ci = paste0(ci*100, "%"),
      n1 = n1,
      n2 = n2,
      KS = ks_stat,
      p.value = p.value,
      rejection_lev = rejection_lev,
      H0 = ifelse(KS > rejection_lev, "Rejected", "Non-Rejected")
    )
    class(kab) <- c("ks_test_ts", class(kab))
    return(kab)
  }
}
  • Assumption 2.: Uncorrelated residuals. If we do not reject Assumption 1. we can use Box-cox test, however if we reject Assumption 1. the Box-Cox tests are not valid and it is better to rely on a more general approach. In general, we can fit an AR(p) mode on the standardized residuals, i.e.
    \[ \tilde{\varepsilon}_t \sim a_0 + a_1 \tilde{\varepsilon}_{t-1} + \dots + a_p \tilde{\varepsilon}_{t-p} + z_t \text{,} \] and test for the significance of the entire regression with the F-test. More precisely, the F-statistic and its p-value are evaluated under the null hypothesis \[ H_0 = a_1 = \dots = a_p = 0 \implies \tilde{\varepsilon}_t = a_0 + z_t \text{.} \] In other words, under \(H_0\) the standardized residuals are uncorrelated and with expectation equal to \(a_0\). Is the AR model able to remove the autocorrelation in mean? Are the standardized residuals stationary in variance?

4.2.3 Task B.2 (c)

Let’s consider a more general model by fitting an ARMA(2,2). Such process will have the form: \[ \tilde{S}_t = \phi_1 \tilde{S}_{t-1} + \phi_2 \tilde{S}_{t-2} + \theta_1 \varepsilon_{t-1} + \theta_2 \varepsilon_{t-2} + \varepsilon_t \text{,} \] and can be fitted with maximum likelihood and by minimizing the sum the squared residuals. We refer to this section for more details. In this case, we prefer to be agnostic about the true distribution of \(\varepsilon_t\) and use the second method. In R you can consider the function arima(). Compute the fitted residuals as \[ \varepsilon_t = \tilde{S}_t - \phi_1 \tilde{S}_{t-1} - \phi_2 \tilde{S}_{t-2} - \theta_1 \varepsilon_{t-1} - \theta_2 \varepsilon_{t-2} \text{.} \tag{10}\] Re-compute the std. deviation as in Equation 8 and the standardized residuals as in Equation 9. Then, repeat the tests of Section 4.2.2 on the new standardized residuals. Is the ARMA model able to remove the autocorrelation in mean? Are the new standardized residuals stationary in variance?

4.3 Task B.3

4.3.1 Task B.3 (a)

A popular approach to model the conditional variance of a time series is a GARCH(p,q) model. For simplicity, let’s consider a GARCH(1,1) model, i.e. \[ \begin{aligned} & {}\varepsilon_t = \sigma_{t} u_t \\ & \sigma_{t}^2 = \omega + \alpha \varepsilon_{t-1}^2 + \beta \sigma_{t-1}^2 \\ & u_{t} \sim \; \text{IID}(0, 1) \end{aligned} \] In general, the parameters of such models are estimated with a Quasi-Maximum Likelihood approach where, as long as the residuals are identically distributed, independent and with mean zero and unitary variance, one can estimate the parameters by assuming a normal distribution for \(u_t\). Moreover, to ensure that the long-term GARCH variance is coherent with the variance estimated on the residuals \(\varepsilon_t\), is it possible to constraint \(\omega\) as follows, i.e. \[ \sigma_{t}^2 = \sigma_{\varepsilon}^2 + \alpha (1 - \sigma_{\varepsilon}^2) \varepsilon_{t-1}^2 + \beta (1 - \sigma_{\varepsilon}^2) \sigma_{t-1}^2 \text{.} \] Compute the long-term GARCH(1,1) variance, i.e. \(\mathbb{E}\{\sigma_t^2\}\) \[ \sigma_{\infty}^2 = \mathbb{E}\{\sigma_t^2\} = \frac{\omega}{1-\alpha-\beta} \text{,} \] and compare it with the variance estimated \(\sigma_{\varepsilon}^2\) estimated in Section 4.2.2. Are the two variances equal? Is the estimated model stationary?

Hint: For the parameter estimation you can consider the R package rugarch in which is implemented the QMLE estimator, the possibility to automatically contraint the variance and many others different GARCH models. For the stationarity condition check the estimated parameters \(\alpha\) and \(\beta\).

4.3.2 Task B.3 (b)

Once the parameters \(\omega\), \(\alpha\) and \(\beta\) are estimated, the series of GARCH variance \(\sigma_t^2\) become available and one can compute the standardized GARCH residuals \[ u_t = \frac{\varepsilon_t}{\sigma_t} \text{.} \] Let’s verify if the assumption of the GARCH model holds. The assumption are three:

  • Assumption 1.: Identically distributed residuals. We can verify this hypothesis with a two-sample Kolmogorov test.

Hint: You can consider the same implementation used in Section 4.2.2.

  • Assumption 2.: Uncorrelated residuals. If we do not reject 1. we can use Box-cox test.

  • Assumption 3.: The assumption of the QMLE estimatior is that the residuals are normally distributed, i.e. \(u_t \sim \mathcal{N}(0,1)\). We can verify this hypothesis with a Kolmogorov test by comparing the empirical distribution of \(u_t\) with the theoric distribution of a standard normal.

Hint: You can consider the following implementation for the Kolmogorov Smirnov Test for a distribution. (See here) for more details)

Kolmogorov Smirnov Test for a Distribution
#' Kolmogorov distribution functions
#'
#' @param x vector of quantiles.
#' @param p vector of probabilities.
#' @param k finite value for approximation of infinite sum.
#' @return A probability, a numeric vector in 0, 1.
#' @examples
#' pks(2, 1000)
#' pks(10, 1000)
pks <- function(x, k = 1000){
  # Infinite sum function
  # k: approximation for infinity
  inf_sum <- function(x, k = 100){
    isum <- 0
    for(i in 1:k){
      isum <- isum + (-1)^(i-1)*exp(-2*(x^2)*(i^2))
    }
    isum <- 2 * isum
    isum
  }
  # Compute probability
  p <- c()
  for(i in 1:length(x)){
    p[i] <- 1 - ifelse(x[i] <= 0, 1, inf_sum(x[i], k = k))
  }
  p[p<0] <- 0
  return(p)
}

#' Kolmogorov quantile functions
#' @param p vector of probabilities.
qks <- function(p, k = 100, upper = 10){
  loss <- function(x, p){
    (pks(x, k = k) - p)^2
  }
  x <- c()
  for(i in 1:length(p)){
    x[i] <- optim(par = 2, method = "Brent", lower = 0, upper = upper, fn = loss, p = p[i])$par
  }
  return(x)
}

#' Kolmogorov Smirnov test for a distribution
#'
#' Test against a specific distribution
#'
#' @param x a vector.
#' @param cdf a function. The theoric distribution to use for comparison.
#' @param ci p.value for rejection.
#' @param min_quantile minimum quantile for the grid of values.
#' @param max_quantile maximum quantile for the grid of values.
#' @param k finite value for approximation of infinite sum.
#' @param plot when `TRUE` a plot is returned, otherwise a `tibble`.
#' @return when `plot = TRUE` a plot is returned, otherwise a `tibble`.
ks_test <- function(x, cdf, ci = 0.05, min_quantile = 0.015, max_quantile = 0.985, k = 1000, plot = FALSE){
  # number of observations
  n <- length(x)
  # Set the interval values for upper and lower band
  grid <- seq(quantile(x, min_quantile), quantile(x, max_quantile), 0.01)
  # Empirical cdf
  cdf_x <- ecdf(x)
  # Compute the KS-statistic
  ks_stat <- sqrt(n)*max(abs(cdf_x(grid) - cdf(grid)))
  # Compute the rejection level
  rejection_lev <- qks(1-ci, k = k)
  # Compute the p-value
  p.value <- 1-pks(ks_stat, k = k)

  # ========================== Plot ==========================
  if (plot) {
    y_breaks <- seq(0, 1, 0.2)
    y_labels <- paste0(format(y_breaks*100, digits = 2), "%")
    grid_max <- grid[which.max(abs(cdf_x(grid) - cdf(grid)))]
    plt <- ggplot()+
      geom_ribbon(aes(grid, ymax = cdf_x(grid), ymin = cdf(grid)),
                  alpha = 0.5, fill = "green") +
      geom_line(aes(grid, cdf_x(grid)))+
      geom_line(aes(grid, cdf(grid)), color = "red")+
      geom_segment(aes(x = grid_max, xend = grid_max,
                       y = cdf_x(grid_max), yend = cdf(grid_max)),
                   linetype = "solid", color = "magenta")+
      geom_point(aes(grid_max, cdf_x(grid_max)), color = "magenta")+
      geom_point(aes(grid_max, cdf(grid_max)), color = "magenta")+
      scale_y_continuous(breaks = y_breaks, labels = y_labels)+
      labs(x = "x", y = "cdf")+
      theme_bw()
    return(plt)
  } else {
    kab <- dplyr::tibble(
      n = n,
      alpha = paste0(format(ci*100, digits = 3), "%"),
      KS = ks_stat,
      rejection_lev = rejection_lev,
      p.value = p.value,
      H0 = ifelse(KS > rejection_lev, "Rejected", "Non-Rejected"))
    class(kab) <- c("ks_test", class(kab))
    return(kab)
  }
}

4.4 Task B.4

Let’s consider a Gaussian mixture distribution for the GARCH residuals, i.e.  \[ u_t \sim B \cdot (\mu_{1} + \sigma_{1} Z_1) + (1-B) \cdot (\mu_{2} + \sigma_{2} Z_2) \]

where \(B \sim \text{Bernoulli}(p)\), while \(Z_1\) and \(Z_2\) are standard normal. All are assumed to be independent. Estimate the parameters \(\mu_{1}\), \(\mu_{2}\), \(\sigma_{1}\), \(\sigma_{2}\) and \(p\) with maximum likelihood (see here for more details). Consider the estimated parameters and give a possible interpretation of the two states, i.e. B = 1, B = 0.

5 Task C

Let’s say that we want to understand the impact on price when demand increases and is satisfied by a specific energy source (e.g., fossil, hydro, wind, etc.). In practice we ask ourselves the following question:

“What happens to the price if demand increases by 1%, and that increase is matched by a 1% increase in, say, fossil output?”

This implies a price sensitivity to marginal supply source. Electricity prices are set by the marginal unit that supplies the last unit of demand. So:

  • If the marginal increase in demand is met by cheap supply (e.g., wind), prices rise little.
  • If it’s met by expensive generation (e.g., fossil), prices increase significantly.

To capture this in a model, we need a model that is able to distinguish which source is meeting the marginal demand.

5.1 Task C.1

Let’s decompose the demand across sources by defining a model of the form \[ S_t = \beta_0 + \beta_1 d_t + \gamma_1 d_t T_t + \sum_{\ast} \gamma_{\ast} \cdot \left[d_t \cdot s_{\ast,t} \right] + \text{controls} + \varepsilon_t \text{,} \] where

  • \(d_t = \log(D_t + 1)\): log electricity demand.
  • \(\beta_1\): baseline elasticity of price to demand.
  • \(T_t\): is the temperature.
  • \(\gamma_{\ast}\): differential impact when demand is met by source \(\ast\).
  • \(S_t = \log(P_t)\): log electricity price at time t from Equation 5.
  • \(s_{\ast,t}\): share of demand met by source \(\ast\) from Equation 2.
  • \(\text{controls}\) variables include the hour of day, weekday and month as in Equation 6.

Note that the use of \(\log(D_t + 1)\) instead of \(\log(D_t)\) is a practical solution when some values of demand might be zero or near-zero, preventing undefined or highly negative values. Adding 1 ensures the transformation is always valid. Taking the derivative of the price with respect to the demand one obtain a decomposition of the total price elasticity to demand at time t, depending on which source is satisfying this demand, i.e.  \[ \partial_{d_t} S_t = \beta_1 + \gamma_1 T_t + \sum_{\ast} \gamma_{\ast} s_{\ast,t} \text{,} \tag{11}\] where each \(\gamma_{\ast}\) is interpreted as the sensitivity to electricity price is to a 1% change in demand on that source of energy.

5.2 Task C.2

Let’s provide an interpretation of the main coefficients of the estimated model. Are the demands shocks equally expensive? Which is the source of energy that impact mostly on the price? Compute the impact of an increase in 1% in demand distinguished by the different sources. What are the policies that we can derive?

5.3 Task C.3

Simulate the price impact of a 5% demand increase under 3 hypothetical supply scenarios. The objective is to quantify the price response to a 5% increase in demand under three custom-defined supply mixes.

  1. Current mix: 37% fossil, 23% nuclear, 20% wind, 15% hydroelettric, 5% solar.

  2. Green mix 1: 20% fossil, 25% nuclear, 25% wind, 15% hydroelettric, 15% solar.

  3. Green mix 2: 10% fossil, 30% nuclear, 20% wind, 20% hydroelettric, 20% solar.

  4. Fully decarbonized current supply mix 0% fossil, 25% nuclear, 25% wind, 25% hydroelettric, 25% solar.

For each mix compute the total elasticity and the expected log-price increase in percentage, i.e.  \[ \Delta S_t = \partial_{d_t} S_t \cdot \log(1 + r) \implies \% \Delta P_t = (e^{\Delta S_t} - 1) \cdot 100 \text{,} \] where \(r\) is the demand shock, in this case 5% and the marginal effect \(\partial_{d_t} S_t\) at a given time t reads as in Equation 11. For each energy mix, let’s assume different levels for temperature, i.e. , 10°, 15°, 20°, 25°, 30°, 35°.

Back to top

Citation

BibTeX citation:
@online{sartini2026,
  author = {Sartini, Beniamino},
  date = {2026-06-11},
  url = {https://greenfin.it/projects/project2026.html},
  langid = {en}
}
For attribution, please cite this work as:
Sartini, Beniamino. 2026. June 11, 2026. https://greenfin.it/projects/project2026.html.