Stationarity tests

Author
Affiliation

Beniamino Sartini

University of Bologna

Published

May 1, 2024

Modified

June 19, 2024

Setup
library(dplyr)
library(knitr)
library(kableExtra)
library(ggplot2)

1 Dickey–Fuller test

The Dickey–Fuller test tests the null hypothesis that a unit root is present in an autoregressive (AR) model. The alternative hypothesis is different depending on which version of the test is used, but is usually stationarity or trend-stationarity. Let’s consider an AR(1) model, i.e. (1)xt=μ+δt+ϕ1xt1+ut, or equivalently (2)Δxt=μ+δt+(1ϕ1)xt1+ut.

The, the Dickey–Fuller hypothesis are: H0:ϕ1=1(non stationarity)H1:ϕ1<1(stationarity) The statistic test is obtained as: DF=1ϕ1Sd{1ϕ1} However, since the test is done over the residual term rather than raw data, it is not possible to use standard t-distribution to provide critical values. Therefore, the statistic DF has a specific distribution. In common implementation there are three versions of the test, i.e. standard (ψ=0,μ=0), drift (μ=0), trend in .

2 Augmented Dickey–Fuller test

The augmented Dickey–Fuller is a more general version of the Dickey–Fuller test for a general AR(p) model, i.e.  Δxt=μ+δt+γxt1+i=1pϕiΔxti

Then, the augmented Dickey–Fuller hypothesis are: H0:γ=0(non stationarity)H1:γ<0(stationarity) The statistic test is obtained as: ADF=γSd{γ} As in the simpler case, the critical values are computed using a specific table for the Dickey–Fuller test.

3 Kolmogorov-Smirnov test

The Kolmogorov–Smirnov two-sample test (KS test) can be used to test whether two samples came from the same distribution. Let’s define the empirical distribution function Fn of n-independent and identically distributed ordered observations X(i) as Fn(x)=1ni=1n1(,x](X(i)). The KS statistic quantifies a distance between the empirical distribution function of the sample and the cumulative distribution functions of two samples. The null distribution of this statistic is calculated under the null hypothesis that the samples are drawn from the same distribution, i.e.  H0:XstationaryH1:Xnon stationary The statistic test for two samples with dimension n1 and n2 is defined as: KSn1,n2=supx|Fn1(x)Fn2(x)|, and for large samples, the null hypothesis is rejected at level α if: KSn1,n2>12n2ln(α2)(1+n2n1).

To apply the test in a time series settings, it is possible to random split the original series in two sub samples and apply the test above.

3.1 Example: check stationary

Let’s simulate 500 observations of XN(0.4,1), then split the series in a random point and compute the Kolmogorov-Smirnov statistic.

KS-test on a stationary time series
# ============== Setups ==============
set.seed(5) # random seed
ci <- 0.05  # confidence level (alpha)
n <- 500    # number of simulations
# ====================================
# Simulated stationary sample 
x <- rnorm(n, 0.4, 1)
# Random split of the time series
idx_split <- sample(n, 1)
x1 <- x[1:idx_split]
x2 <- x[(idx_split+1):n]
# Number of elements for each sub sample 
n1 <- length(x1)
n2 <- length(x2)
# Grid of values for KS-statistic
grid <- seq(quantile(x, 0.015), quantile(x, 0.985), 0.01)
# Empiric cdfs 
cdf_1 <- ecdf(x1)
cdf_2 <- ecdf(x2)
# KS-statistic 
ks_stat <- max(abs(cdf_1(grid) - cdf_2(grid)))
# Rejection level with probability alpha 
rejection_lev <- sqrt(-0.5*log(ci/2))*sqrt((n1+n2)/(n1*n2))
# ========================== Plot ==========================
y_breaks <- seq(0, 1, 0.2)
y_labels <- paste0(format(y_breaks*100, digits = 2), "%")
grid_max <- grid[which.max(abs(cdf_1(grid) - cdf_2(grid)))]
ggplot()+
  geom_ribbon(aes(grid, ymax = cdf_1(grid), ymin = cdf_2(grid)), 
              alpha = 0.5, fill = "green") +
  geom_line(aes(grid, cdf_1(grid)))+
  geom_line(aes(grid, cdf_2(grid)), color = "red")+
  geom_segment(aes(x = grid_max, xend = grid_max, 
                   y = cdf_1(grid_max), yend = cdf_2(grid_max)), 
               linetype = "solid", color = "magenta")+
  geom_point(aes(grid_max, cdf_1(grid_max)), color = "magenta")+
  geom_point(aes(grid_max, cdf_2(grid_max)), color = "magenta")+
  scale_y_continuous(breaks = y_breaks, labels = y_labels)+
  labs(x = "x", y = "cdf")+
  theme_bw()
Figure 1: Two samples cdfs and KS-statistic (magenta) for a stationary time series.
Table 1: KS test for a stationary time series.
Index split α n1 n2 KSn1,n2 Critical Level H0
234 5% 234 266 0.07255 0.1217 Non-Rejected

In the null hypothesis, i.e. the two samples come from the same distribution, is not reject with the confidence level α=5%.

3.2 Example: check non-stationary

Let’s now simulate 250 observations as X1N(0,1) and the following 250 as X2N(0.3,1). Then the non-stationary series with a structural break is given by X=(X1,X2). As before, we random split the series and apply the Kolmogorov-Smirnov test.

KS-test on a non-stationary time series
# ============== Setups ==============
set.seed(2) # random seed
ci <- 0.05  # confidence level (alpha)
n <- 500    # number of simulations
# ====================================
# Simulated non-stationary sample 
x1 <- rnorm(n/2, 0, 1)
x2 <- rnorm(n/2, 0.3, 1)
x <- c(x1, x2)
# Random split of the time series
idx_split <- sample(n, 1)
x1 <- x[1:idx_split]
x2 <- x[(idx_split+1):n]
# Number of elements for each sub sample 
n1 <- length(x1)
n2 <- length(x2)
# Grid of values for KS-statistic
grid <- seq(quantile(x, 0.015), quantile(x, 0.985), 0.01)
# Empiric cdfs 
cdf_1 <- ecdf(x1)
cdf_2 <- ecdf(x2)
# KS-statistic 
ks_stat <- max(abs(cdf_1(grid) - cdf_2(grid)))
# Rejection level  
rejection_lev <- sqrt(-0.5*log(ci/2))*sqrt((n1+n2)/(n1*n2))
# ========================== Plot ==========================
y_breaks <- seq(0, 1, 0.2)
y_labels <- paste0(format(y_breaks*100, digits = 2), "%")
grid_max <- grid[which.max(abs(cdf_1(grid) - cdf_2(grid)))]
ggplot()+
  geom_ribbon(aes(grid, ymax = cdf_1(grid), ymin = cdf_2(grid)), 
              alpha = 0.5, fill = "green") +
  geom_line(aes(grid, cdf_1(grid)))+
  geom_line(aes(grid, cdf_2(grid)), color = "red")+
  geom_segment(aes(x = grid_max, xend = grid_max, 
                   y = cdf_1(grid_max), yend = cdf_2(grid_max)), 
               linetype = "solid", color = "magenta")+
  geom_point(aes(grid_max, cdf_1(grid_max)), color = "magenta")+
  geom_point(aes(grid_max, cdf_2(grid_max)), color = "magenta")+
  scale_y_continuous(breaks = y_breaks, labels = y_labels)+
  labs(x = "x", y = "cdf")+
  theme_bw()
Figure 2: Two samples cdfs and KS-statistic (magenta) for a non-stationary time series.

In the null hypothesis, i.e. the two samples come from the same distribution, is reject with a confidence level α=5%, hence the two samples come from different distributions.

Table 2: KS test for a non-stationary time series.
Index split α n1 n2 KSn1,n2 Critical Level H0
162 5% 162 338 0.1513 0.1298 Rejected
Back to top

Citation

BibTeX citation:
@online{sartini2024,
  author = {Sartini, Beniamino},
  title = {Stationarity Tests},
  date = {2024-05-01},
  url = {https://greenfin.it/statistics/tests/stationarity-tests.html},
  langid = {en}
}
For attribution, please cite this work as:
Sartini, Beniamino. 2024. “Stationarity Tests.” May 1, 2024. https://greenfin.it/statistics/tests/stationarity-tests.html.