Stationarity tests

Author

Affiliation

Beniamino Sartini

University of Bologna

Published

May 1, 2024

Modified

June 19, 2024

Setup

library(dplyr)
library(knitr)
library(kableExtra)
library(ggplot2)

1 Dickey–Fuller test

The Dickey–Fuller test tests the null hypothesis that a unit root is present in an autoregressive (AR) model. The alternative hypothesis is different depending on which version of the test is used, but is usually stationarity or trend-stationarity. Let’s consider an AR(1) model, i.e. $\begin{matrix} (1) & x_{t} = μ + δ t + ϕ_{1} x_{t - 1} + u_{t}, \end{matrix}$ or equivalently $\begin{matrix} (2) & Δ x_{t} = μ + δ t + (1 - ϕ_{1}) x_{t - 1} + u_{t} . \end{matrix}$

The, the Dickey–Fuller hypothesis are: $\begin{aligned} H_{0} : ϕ_{1} = 1 (non stationarity) \\ H_{1} : ϕ_{1} < 1 (stationarity) \end{aligned}$ The statistic test is obtained as: $D F = \frac{1 - ϕ_{1}}{S d {1 - ϕ_{1}}}$ However, since the test is done over the residual term rather than raw data, it is not possible to use standard t-distribution to provide critical values. Therefore, the statistic $D F$ has a specific distribution. In common implementation there are three versions of the test, i.e. standard ( $ψ = 0, μ = 0$ ), drift ( $μ = 0$ ), trend in Equation 1.

2 Augmented Dickey–Fuller test

The augmented Dickey–Fuller is a more general version of the Dickey–Fuller test for a general AR(p) model, i.e. $Δ x_{t} = μ + δ t + γ x_{t - 1} + \sum_{i = 1}^{p} ϕ_{i} Δ x_{t - i}$

Then, the augmented Dickey–Fuller hypothesis are: $\begin{aligned} H_{0} : γ = 0 (non stationarity) \\ H_{1} : γ < 0 (stationarity) \end{aligned}$ The statistic test is obtained as: $A D F = \frac{γ}{S d {γ}}$ As in the simpler case, the critical values are computed using a specific table for the Dickey–Fuller test.

3 Kolmogorov-Smirnov test

The Kolmogorov–Smirnov two-sample test (KS test) can be used to test whether two samples came from the same distribution. Let’s define the empirical distribution function $F_{n}$ of $n$ -independent and identically distributed ordered observations $X_{(i)}$ as $F_{n} (x) = \frac{1}{n} \sum_{i = 1}^{n} 1_{(- \infty, x]} (X_{(i)}) .$ The KS statistic quantifies a distance between the empirical distribution function of the sample and the cumulative distribution functions of two samples. The null distribution of this statistic is calculated under the null hypothesis that the samples are drawn from the same distribution, i.e. $\begin{aligned} H_{0} : X stationary \\ H_{1} : X non stationary \end{aligned}$ The statistic test for two samples with dimension $n_{1}$ and $n_{2}$ is defined as: $K S_{n_{1}, n_{2}} = sup_{\forall x} | F_{n_{1}} (x) - F_{n_{2}} (x) |,$ and for large samples, the null hypothesis is rejected at level $α$ if: $K S_{n_{1}, n_{2}} > \sqrt{- \frac{1}{2 n_{2}} \ln (\frac{α}{2}) (1 + \frac{n_{2}}{n_{1}})} .$

To apply the test in a time series settings, it is possible to random split the original series in two sub samples and apply the test above.

3.1 Example: check stationary

Let’s simulate 500 observations of $X \sim N (0.4, 1)$ , then split the series in a random point and compute the Kolmogorov-Smirnov statistic.

KS-test on a stationary time series

# ============== Setups ==============
set.seed(5) # random seed
ci <- 0.05  # confidence level (alpha)
n <- 500    # number of simulations
# ====================================
# Simulated stationary sample 
x <- rnorm(n, 0.4, 1)
# Random split of the time series
idx_split <- sample(n, 1)
x1 <- x[1:idx_split]
x2 <- x[(idx_split+1):n]
# Number of elements for each sub sample 
n1 <- length(x1)
n2 <- length(x2)
# Grid of values for KS-statistic
grid <- seq(quantile(x, 0.015), quantile(x, 0.985), 0.01)
# Empiric cdfs 
cdf_1 <- ecdf(x1)
cdf_2 <- ecdf(x2)
# KS-statistic 
ks_stat <- max(abs(cdf_1(grid) - cdf_2(grid)))
# Rejection level with probability alpha 
rejection_lev <- sqrt(-0.5*log(ci/2))*sqrt((n1+n2)/(n1*n2))
# ========================== Plot ==========================
y_breaks <- seq(0, 1, 0.2)
y_labels <- paste0(format(y_breaks*100, digits = 2), "%")
grid_max <- grid[which.max(abs(cdf_1(grid) - cdf_2(grid)))]
ggplot()+
  geom_ribbon(aes(grid, ymax = cdf_1(grid), ymin = cdf_2(grid)), 
              alpha = 0.5, fill = "green") +
  geom_line(aes(grid, cdf_1(grid)))+
  geom_line(aes(grid, cdf_2(grid)), color = "red")+
  geom_segment(aes(x = grid_max, xend = grid_max, 
                   y = cdf_1(grid_max), yend = cdf_2(grid_max)), 
               linetype = "solid", color = "magenta")+
  geom_point(aes(grid_max, cdf_1(grid_max)), color = "magenta")+
  geom_point(aes(grid_max, cdf_2(grid_max)), color = "magenta")+
  scale_y_continuous(breaks = y_breaks, labels = y_labels)+
  labs(x = "x", y = "cdf")+
  theme_bw()

Figure 1: Two samples cdfs and KS-statistic (magenta) for a stationary time series.

Table 1: KS test for a stationary time series.

$Index split$	$α$	$n_{1}$	$n_{2}$	$K S_{n_{1}, n_{2}}$	$Critical Level$	$H_{0}$
234	5%	234	266	0.07255	0.1217	Non-Rejected

In Table 1 the null hypothesis, i.e. the two samples come from the same distribution, is not reject with the confidence level $α = 5 %$ .

3.2 Example: check non-stationary

Let’s now simulate 250 observations as $X_{1} \sim N (0, 1)$ and the following 250 as $X_{2} \sim N (0.3, 1)$ . Then the non-stationary series with a structural break is given by $X = (X_{1}, X_{2})$ . As before, we random split the series and apply the Kolmogorov-Smirnov test.

KS-test on a non-stationary time series

# ============== Setups ==============
set.seed(2) # random seed
ci <- 0.05  # confidence level (alpha)
n <- 500    # number of simulations
# ====================================
# Simulated non-stationary sample 
x1 <- rnorm(n/2, 0, 1)
x2 <- rnorm(n/2, 0.3, 1)
x <- c(x1, x2)
# Random split of the time series
idx_split <- sample(n, 1)
x1 <- x[1:idx_split]
x2 <- x[(idx_split+1):n]
# Number of elements for each sub sample 
n1 <- length(x1)
n2 <- length(x2)
# Grid of values for KS-statistic
grid <- seq(quantile(x, 0.015), quantile(x, 0.985), 0.01)
# Empiric cdfs 
cdf_1 <- ecdf(x1)
cdf_2 <- ecdf(x2)
# KS-statistic 
ks_stat <- max(abs(cdf_1(grid) - cdf_2(grid)))
# Rejection level  
rejection_lev <- sqrt(-0.5*log(ci/2))*sqrt((n1+n2)/(n1*n2))
# ========================== Plot ==========================
y_breaks <- seq(0, 1, 0.2)
y_labels <- paste0(format(y_breaks*100, digits = 2), "%")
grid_max <- grid[which.max(abs(cdf_1(grid) - cdf_2(grid)))]
ggplot()+
  geom_ribbon(aes(grid, ymax = cdf_1(grid), ymin = cdf_2(grid)), 
              alpha = 0.5, fill = "green") +
  geom_line(aes(grid, cdf_1(grid)))+
  geom_line(aes(grid, cdf_2(grid)), color = "red")+
  geom_segment(aes(x = grid_max, xend = grid_max, 
                   y = cdf_1(grid_max), yend = cdf_2(grid_max)), 
               linetype = "solid", color = "magenta")+
  geom_point(aes(grid_max, cdf_1(grid_max)), color = "magenta")+
  geom_point(aes(grid_max, cdf_2(grid_max)), color = "magenta")+
  scale_y_continuous(breaks = y_breaks, labels = y_labels)+
  labs(x = "x", y = "cdf")+
  theme_bw()

Figure 2: Two samples cdfs and KS-statistic (magenta) for a non-stationary time series.

In Table 2 the null hypothesis, i.e. the two samples come from the same distribution, is reject with a confidence level $α = 5 %$ , hence the two samples come from different distributions.

Table 2: KS test for a non-stationary time series.

$Index split$	$α$	$n_{1}$	$n_{2}$	$K S_{n_{1}, n_{2}}$	$Critical Level$	$H_{0}$
162	5%	162	338	0.1513	0.1298	Rejected

Citation

BibTeX citation:

@online{sartini2024,
  author = {Sartini, Beniamino},
  title = {Stationarity Tests},
  date = {2024-05-01},
  url = {https://greenfin.it/statistics/tests/stationarity-tests.html},
  langid = {en}
}

For attribution, please cite this work as:

Sartini, Beniamino. 2024. “Stationarity Tests.” May 1, 2024. https://greenfin.it/statistics/tests/stationarity-tests.html.