Author
Affiliation

Beniamino Sartini

University of Bologna

Published

June 21, 2025

Modified

June 21, 2025

1 Hourly energy demand generation and weather

Code
library(tidyverse)
library(backports)
library(solarr)
library(latex2exp)

2 Description of the data

The dataset contains 4 years of electrical consumption, generation, pricing, and weather data for Spain. Consumption and generation data was retrieved from ENTSOE a public portal for Transmission Service Operator (TSO) data. Settlement prices were obtained from the Spanish TSO Red Electric España. Weather data was purchased as part of a personal project from the Open Weather API for the 5 largest cities in Spain. The original datasets can be downloaded from Kaggle. However, the dataset to be used for the project has been already pre-processed. The code used for preprocessing is in this github repository, while the dataset to be used can be downloaded here.

Tip 1: Variables description
  • date: datetime index localized to CET.
  • date_day: date without hour.
  • city: reference city. (Valencia. Madrid, Bilbao,Barcelona, Seville)
  • Year: number for the year.
  • Month: number for the month.
  • Day: number for the day in a month.
  • Hour: reference hour of the day.

Energy variables

  • fossil: energy produced with fossil sources: i.e. coal/lignite, natural gas, oil (MW).
  • nuclear: energy produced with nuclear sources (MW).
  • energy_other: other generation (MW).
  • solar: solar generation (MW).
  • biomass: biomass generation (MW).
  • river: hydroeletric generation (MW).
  • wind: wind generation (MW).
  • energy_other_renewable: other renewable generation (MW).
  • energy_waste: wasted generation (MW).
  • pred_solar: forecasted solar generation (MW).
  • pred_demand: forecasted electrical demand by TSO (MW).
  • pred_price: forecasted day ahead electrical price by TSO (Eur/MWh).
  • price: actual electrical price (Eur/MWh).
  • demand: actual electrical demand (Eur/MWh).
  • prod_tot: total electrical production (MW).
  • prod_renewable: total renewable electrical production (MW).

Weather variables

  • temp: mean temperature.
  • temp_min: minimum temperature (degrees).
  • temp_max: maximum temperature (degrees).
  • pressure: pressure in (hPas).
  • humidity: humidity in percentage.
  • wind_speed: wind speed (m/s).
  • wind_deg: wind direction.
  • rain_1h: rain in last hour (mm).
  • rain_3h: rain in last 3 hours (mm).
  • snow_3h: show last 3 hours (mm).
  • clouds_all: cloud cover in percentage.
Import data
dir_data <- "../grenfin-summer-school/data"
filepath <- file.path(dir_data, "data.csv")
data <- readr::read_csv(filepath, show_col_types = FALSE, progress = FALSE)
head(data) %>%
  knitr::kable(booktabs = TRUE ,escape = FALSE, align = 'c')%>%
  kableExtra::row_spec(0, color = "white", background = "green")
date date_day Year Month Day Hour Season fossil nuclear energy_other solar biomass river energy_other_renewable energy_waste wind pred_solar pred_demand pred_price demand price prod prod_renewable temp temp_min temp_max pressure humidity wind_speed wind_deg rain_1h rain_3h snow_3h clouds_all
2014-12-31 23:00:00 2014-12-31 2014 12 31 23 Winter 10156 7096 43 49 447 3813 73 196 6378 17 26118 50.10 25385 65.41 27939 10687 -0.6585375 -5.825 8.475 1016.4 82.4 2.0 135.2 0 0 0 0
2015-01-01 00:00:00 2015-01-01 2015 1 1 0 Winter 10437 7096 43 50 449 3587 71 195 5890 16 24934 48.10 24382 64.92 27509 9976 -0.6373000 -5.825 8.475 1016.2 82.4 2.0 135.8 0 0 0 0
2015-01-01 01:00:00 2015-01-01 2015 1 1 1 Winter 9918 7099 43 50 448 3508 73 196 5461 8 23515 47.33 22734 64.48 26484 9467 -1.0508625 -6.964 8.136 1016.8 82.0 2.4 119.0 0 0 0 0
2015-01-01 02:00:00 2015-01-01 2015 1 1 2 Winter 8859 7098 43 50 438 3231 75 191 5238 2 22642 42.27 21286 59.32 24914 8957 -1.0605312 -6.964 8.136 1016.6 82.0 2.4 119.2 0 0 0 0
2015-01-01 03:00:00 2015-01-01 2015 1 1 3 Winter 8313 7097 43 42 428 3499 74 189 4935 9 21785 38.41 20264 56.04 24314 8904 -1.0041000 -6.964 8.136 1016.6 82.0 2.4 118.4 0 0 0 0
2015-01-01 04:00:00 2015-01-01 2015 1 1 4 Winter 7962 7098 43 34 410 3804 74 188 4618 4 21441 35.72 19905 53.63 23926 8866 -1.1260000 -7.708 7.317 1017.4 82.6 2.4 174.8 0 0 0 0
R packages
library(tidyverse)
library(backports)
library(solarr)
library(latex2exp)

3 Project

3.1 Problem A. Explorative analysis

Consider the data of the energy market in Spain and let’s examine the composition of the energy mix. At your disposal you have 5 sources that generate energy, i.e. 4 renewable (columns biomass, solar, river, wind) and 2 non-renewable (columns nuclear, fossil). The column demand denotes the amount of demanded electricity in MW, instead the column price denotes the actual price in MW/h. For simplicity, let’s denote with Et the energy of the -th source, i.e.  Et={Etbb=biomassEtss=solarEthh=hydroelettricEtww=windEtnn=nuclearEtff=fossil and the total energy demanded at time t as Dt.

3.1.1 Task A.1

Compute the average between the energy produced from each source and the total demand, i.e.
(1)R=1nt=1nEtDt

and normalize them, i.e. R/R. Rank the 5 sources of energy by the percentage of demand satisfied in decreasing order and plot the values obtained with histograms.

3.1.2 Task A.2

Aggregate the sources of energy in two groups:

  1. Renewable: is given by the sum of solar, river, wind and biomass.
  2. Non-renewable: is given by the sum of nuclear and fossil.

Then, compute the percentage of energy demand satisfied by each group. What is the percentage of energy satisfied by renewable sources? And by the fossil ones?

Repeat the computations, but filtering the data firstly only for year 2015 and then only for 2018. Are the percentage obtained different? Describe the result and explain if the transition to renewable energy is supported by data (to be supported the percentage of renewable energy should increase while percentage of fossil fuels should decrease.)

3.1.3 Task A.3

For each month, compute the mean percentage of demand of energy satisfied by the fossil sources. In order to compute Rm for the m-month, let’s denote with nm the number of day in that month, then: (2)Rm=1nmt=1nmEtDt=1nmt=1nmRt

Compute also the maximum, minimum, median and standard deviation of Rt for each month. Repeat the computation for the others 4 sources of energy, i.e. nuclear, solar, river and wind. Which are the source with the lowest and the highest standard deviation in December? And in June? Comment the results obtained (max 200 words).

3.1.4 Task A.4

In mean, the demand satisfied by solar energy with respect to total demand is greater or lower with respect to the one satisfied by hydroelectric on 12:00 in March? In July, August and September at the same hour the situation changes? Answer, providing a possible explanation of the results obtained (max 200 words).

3.1.5 Task A.5

Grouping the dataset firstly by season and hour (4×24 groups) and then by month and hour (12×24 groups) compute:

  1. Average electricity price
  2. Average ratio between total production and total demand, i.e. prod/demand.

For both groups, answer to the following questions.

What is the meaning of a ratio between total production and total demand greater than 1? And greater than 1? Following the demand and supply law, what is the expected behavior of the price in such situations? Comment the results (max 200 words).

When the price reaches it’s minimum value? In such situation, does the ratio reach its maximum value? What happen when the prices reaches it’s maximum? Comment the results (max 200 words).

3.2 Problem B: Model hourly electricity price

3.2.1 Task B.1

Let’s define and estimate a model for the hourly electricity price. First of all, we have to define a seasonal function to remove deterministic patterns during the year and the day. Hence, we take into account a function of the form: P¯t=p0+H(t)+M(t) where H(t) stands for: H(t)={H1 Hour=1H2 Hour=2H23Hour=23 and M(t) for: M(t)={M2 Month=FebruaryM12Month=December It follows that, by construction, p0 captures the Hour=0 and Month=January. Estimate the parameters of the models using OLS. Then, compute the seasonal price P¯t and the deseasonalized time series, i.e.  P~t=PtP¯t Plot P¯t and P~t.

3.2.2 Task B.2

Estimate an autoregressive model (AR(1)) on the deseasonalized time series, i.e. P~t=ϕP~t1+εt Is the estimated model stationary? Comment and justify your answer.

Hint: in order to answer consider the stationarity condition on the parameters.

Then, compute the fitted residuals εt and their standard deviation as: σ¯=Sd{εt}=1n1t=1nεt2, and the standardized residuals as εt¯=εtσ¯. Are εt¯ identically distributed? Answer with a formal statistics test with confidence level α=0.05. (see for example see the two sample Kolmogorov-Smirnov test).

3.2.3 Task B.3

A model for the conditional variance is the GARCH(p,q) model. Let’s consider a GARCH(1,2) model under the normality assumption., i.e.  εt=σtutσt2=ω+α1εt12+β1σt12+β2σt22utN(0,1)

Is the estimated model stationary? Hint: in order to answer consider the stationarity condition on the parameters.

Compute the long-term variance under the GARCH(1,1) model, i.e. E{σt2}, and compare it with the variance estimated in Task B.2 (i.e. σ¯2). Are the two variances similar? Comment the results.

Define the standardized GARCH residuals as: ut=εtσt Are now the standardized residuals identically distributed? Answer using the same test used in Task B.2. Is the normality assumption satisfied on the fitted residuals? Answer with a formal test and a α=0.01.

Hint: see for example see the Kolmogorov-Smirnov test.

3.2.4 Task B.4

Let’s consider a Gaussian mixture model for the GARCH residuals, i.e.  utB(μ1+σ1Z1)+(1B)(μ2+σ2Z2)

where BBernoulli(p), while Z1 and Z2 are standard normal. All are assumed to be independent. Estimate the parameters μ1, μ2, σ1, σ2 and p with maximum likelihood (see here for an example). Consider the estimated parameters and give a possible interpretation of the two states, i.e. B = 1, B = 0.

3.2.5 Task B.5

Let’s consider a monthly Gaussian mixture model for the GARCH residuals ut,m, where the index m=1,,12 denotes the m-month, i.e. ut,mBmZm,1+(1Bm)Zm,2 where all the random variables, i.e.  BmBernoulli(pm)Zm,1=N(μm,1,σm,12)Zm,2=N(μm,2,σm,22) are assumed to be independent. Estimate the parameters for all the months with maximum likelihood.

3.2.6 Task B.6

Let’s simulate hourly prices with the model and compare it with the empirical ones. You are free to choose between the basic Gaussian mixture and the monthly version. Then, consider the following two situations:

  1. Projection A: You are 2017-06-28 (known) and you want to simulate 100 trajectories for the price from 2017-06-29 up to 2017-06-30 (1 day, 24 hours).

  2. Projection B: You are in 2017-12-01 (known) and you want to simulate 100 trajectories for the price from 2017-12-02 up to 2017-12-31 (29 days, 696 hours).

Be careful in simulations to always set a seed for reproducibility.

  • The distribution of the residuals in the Projection A are close to the empirical distribution of the residuals in that month? The situation in Projection B changes? Is the model able to forecast the short term movements of the electricity price? Comment the results.

Hint 1: For a graphical comparison between densities you can plot the empirical histograms of ut for the month m=12 (December) and the histograms of the simulated ut for the same month.

Hint 2: Think about the fact that the aim of our model is to try to reproduce the distribution of the electricity prices and not to forecast it’s short term movements (potential omitted values could improve the model performances).

Back to top

Citation

BibTeX citation:
@online{sartini2025,
  author = {Sartini, Beniamino},
  date = {2025-06-21},
  url = {https://greenfin.it/projects/project2024.html},
  langid = {en}
}
For attribution, please cite this work as:
Sartini, Beniamino. 2025. June 21, 2025. https://greenfin.it/projects/project2024.html.