Gaussian mixture

Author

Affiliation

Beniamino Sartini

University of Bologna

Published

May 1, 2024

Modified

July 25, 2024

Setup

library(dplyr)
# required for figures  
library(ggplot2)
library(gridExtra)
# required for render latex 
library(backports)
library(latex2exp)
# required for render tables
library(knitr)
library(kableExtra)
# Random seed 
set.seed(1)

Let’s consider a linear combination of a Bernoulli and two normal random variables, all assumed to be independent, i.e. $\begin{matrix} (1) & X \sim B \cdot Z_{1} + (1 - B) \cdot Z_{2}, \end{matrix}$ or in compact form $X \sim G M (μ_{1}, μ_{2}, σ_{1}^{2}, σ_{2}^{2}, p)$ . Formally, $B \sim Ber (p)$ is a Bernoulli while $Z_{1} = N (μ_{1}, σ_{1}^{2})$ and $Z_{2} = N (μ_{2}, σ_{2}^{2})$ are two independent Gaussian random variables.

Gaussian mixture simulation

# ================== Setups ==================
t_bar <- 5000 # number of steps ahead  
# parameters
par <- c(mu1=-2,mu2=2,sd1=1,sd2=1,p=0.5) 
# ============================================
# Gaussian mixture simulation 
N_1 <- rnorm(t_bar, mean = par[1], sd = par[3])
N_2 <- rnorm(t_bar, mean = par[2], sd = par[4])
B <- rbinom(t_bar, 1, prob = par[5])
Xt <- B*N_1 + (1 - B)*N_2
# Empiric pdf and cdf  
ker <- density(Xt, from = min(Xt), to = max(Xt))
ker$cdf_emp <- cumsum(ker$y/sum(ker$y))
# Components normal pdf 
ker$pdf_Z1 <- dnorm(ker$x, mean = par[1], sd = par[3]) 
ker$pdf_Z2 <- dnorm(ker$x, mean = par[2], sd = par[4])
# Mixture pdf and cdf 
ker$pdf <- par[5]*ker$pdf_Z1 + (1-par[5])*ker$pdf_Z2
ker$cdf <- cumsum(ker$pdf/sum(ker$pdf))
# =================== Plot ===================
# Plot trajectory
plot_gm <- ggplot()+
  geom_point(aes(1:t_bar, Xt), alpha = exp(-0.00009*t_bar))+
  labs(x = "t", y = TeX("$X_t$"))+
  theme_bw()
# Plot pdf 
plot_pdf <- ggplot()+
  geom_line(aes(ker$x, ker$y))+
  geom_line(aes(ker$x, ker$pdf), color = "red")+
  labs(x = NULL, y = "Pdf")+
  theme_bw()+
  coord_flip()
# Plot cdf 
plot_cdf <- ggplot()+
  geom_line(aes(ker$x, ker$cdf_emp))+
  geom_line(aes(ker$x, ker$cdf), color = "red")+
  labs(x = NULL, y = "Cdf")+
  theme_bw()+
  coord_flip()
plot_gm
gridExtra::grid.arrange(plot_pdf, plot_cdf, ncol = 2)

1 Distribution and density

The distribution function of a Gaussian mixture reads explicitely as: $\begin{matrix} (2) & F_{X} (x) = p \cdot Φ (\frac{x - μ_{1}}{σ_{1}}) + (1 - p) \cdot Φ (\frac{x - μ_{2}}{σ_{2}}), \end{matrix}$ where $Φ$ is the cumulative distribution function of a standard normal random variable. Taking the derivative, it can be easily shown that the density function reads: $\begin{matrix} (3) & f_{X} (x) = p \cdot ϕ (\frac{x - μ_{1}}{σ_{1}}) + (1 - p) \cdot ϕ (\frac{x - μ_{2}}{σ_{2}}) . \end{matrix}$ where $ϕ$ is the density function of a standard normal random variable.

dnorm_mix <- function(params) {
  # parameters
  mu1 = params[1]
  mu2 = params[2]
  sd1 = params[3]
  sd2 = params[4]
  p = params[5]
  function(x, log = FALSE){
    probs <- p*stats::dnorm(x, mean = mu1, sd = sd1) + (1-p)*stats::dnorm(x, mean = mu2, sd = sd2)
    if (log) {
      probs <- base::log(probs)
    }
    return(probs)
  }
}

Density of a Gaussian Mixture

Proof. The distribution function of a Gaussian Mixture is defined as: $F_{X} (y) = P (X \leq y) = E {1_{X \leq x}}$ Hence, we can rewrite it in terms of conditional expectation with respect to $B$ , i.e. $\begin{aligned} F_{X} (y) & = E {1_{X \leq x} | B} = \\ = E {1_{X \leq x} | B = 0} P (B = 0) + E {1_{X \leq x} | B = 1} P (B = 1) = \\ = p \cdot P (Z_{1} \leq x) + (1 - p) \cdot P (Z_{2} \leq x) \end{aligned}$ Hence, standardizing the normal random variable we obtain, $F_{X} (x) = p \cdot Φ (\frac{x - μ_{1}}{σ_{1}}) + (1 - p) \cdot Φ (\frac{x - μ_{2}}{σ_{2}}) .$ where $Φ$ denotes the distribution function of a standard normal. Knowing that $f_{X} (x) = \frac{d F_{X} (x)}{d x}$ and that $ϕ_{X} (x) = \frac{d Φ (x)}{d x}$ , where $ϕ$ is the density function of a standard normal we obtain the result, i.e. $f_{X} (x) = p \cdot ϕ (\frac{x - μ_{1}}{σ_{1}}) + (1 - p) \cdot ϕ (\frac{x - μ_{2}}{σ_{2}}) .$

2 Moments

Given that $Z_{1}$ , $Z_{2}$ and $B$ are independent, the expectation can be computed as: $E {X} = p μ_{1} + (1 - p) μ_{2}$ The second moment is computed as: $E {X^{2}} = p (μ_{1}^{2} + σ_{1}^{2}) + (1 - p) (μ_{2}^{2} + σ_{2}^{2})$ Hence, the variance reads: $\begin{array}{r} V {X} = p (1 - p) (μ_{1} - μ_{2})^{2} + σ_{1}^{2} p + σ_{2}^{2} (1 - p) \end{array}$

Gaussian mixture moments

Proof. Given that $Z_{1}$ , $Z_{2}$ and $B$ are independent, the expectation can be computed as: $\begin{aligned} E {X} & = E {E {X | B}} = \\ = E {X | B = 1} P (B = 1) + E {X | B = 0} P (B = 0) = \\ = p E {Z_{1}} + (1 - p) E {Z_{2}} = \\ = p μ_{1} + (1 - p) μ_{2} \end{aligned}$ The second moment is computed similarly to the first one, i.e. $\begin{aligned} E {X^{2}} & = E {E {X^{2} | B}} = \\ = E {X^{2} | B = 1} P (B = 1) + E {X^{2} | B = 0} P (B = 0) = \\ = E {B} E {Z_{1}^{2}} + E {1 - B} E {Z_{2}^{2}} = \\ = p E {Z_{1}^{2}} + (1 - p) E {Z_{2}^{2}} = \\ = p (μ_{1}^{2} + σ_{1}^{2}) + (1 - p) (μ_{2}^{2} + σ_{2}^{2}) \end{aligned}$ The variance, by definition, is given by: $V {X} = E {X^{2}} - E {X}^{2}$ where the first moment squared is $\begin{aligned} E {X}^{2} & = {[p μ_{1} + (1 - p) μ_{2}]}^{2} = \\ = p^{2} μ_{1}^{2} + (1 - p)^{2} μ_{2}^{2} + 2 p (1 - p) μ_{1} μ_{2} \end{aligned}$ Hence the variance, $\begin{aligned} V {X} & = p (μ_{1}^{2} + σ_{1}^{2}) + (1 - p) (μ_{2}^{2} + σ_{2}^{2}) - p^{2} μ_{1}^{2} - (1 - p)^{2} μ_{2}^{2} - 2 p (1 - p) μ_{1} μ_{2} = \\ = p μ_{1}^{2} + p σ_{1}^{2} + μ_{2}^{2} + σ_{2}^{2} - p μ_{2}^{2} - p σ_{2}^{2} - p^{2} μ_{1}^{2} - (1 - p)^{2} μ_{2}^{2} - 2 p (1 - p) μ_{1} μ_{2} = \\ = μ_{1}^{2} p (1 - p) + p σ_{1}^{2} + (1 - p) σ_{2}^{2} + p (1 - p) μ_{2}^{2} - 2 p (1 - p) μ_{1} μ_{2} = \\ = p (1 - p) (μ_{1}^{2} - μ_{2}^{2} - 2 μ_{1} μ_{2}) + p σ_{1}^{2} + (1 - p) σ_{2}^{2} = \\ = p (1 - p) (μ_{1} - μ_{2})^{2} + p σ_{1}^{2} + (1 - p) σ_{2}^{2} \end{aligned}$

3 Maximum likelihood

Minimizing the negative log-likelihood gives an estimate of the parameters, i.e. $\underset{μ_{1}, μ_{2}, σ_{1}, σ_{2}, p}{argmin} {\sum_{i = 1}^{t} \log (f_{X} (x_{i}))},$ or equivalently maximizing the negative log-likelihood, i.e. $\underset{μ_{1}, μ_{2}, σ_{1}, σ_{2}, p}{argmax} {- \sum_{i = 1}^{t} \log (f_{X} (x_{i}))} .$

Example: ML-estimate

Maximum likelihood for Gaussian mixture

# Initialize parameters 
init_params <- par*runif(5, 0.3, 1.1)
# Log-likelihood function
log_lik <- function(params, x){
  # Parameters
  mu1 = params[1]
  mu2 = params[2]
  sd1 = params[3]
  sd2 = params[4]
  p = params[5]
  # Ensure that probability is in (0,1)
  if(p > 0.99 | p < 0.01 | sd1 < 0 | sd2 < 0){ s
    return(NA_integer_)
  }
  # Mixture density
  pdf_mix <- dnorm_mix(params)
  # Log-likelihood
  loss <- sum(pdf_mix(x, log = TRUE), na.rm = TRUE)
  return(loss)
}
# Optimal parameters
# fnscale = -1 to maximize (or use negative likelihood)
ml_estimate <- optim(par = init_params, log_lik, 
                     x = Xt, control = list(maxit = 500000, fnscale = -1))

Table 1: Maximum likelihood estimates for a Gaussian Mixture.

Parameter	True	Estimate	Log-lik	Bias
$μ_{1}$	-2.0	-1.9905803	-10332.56	0.0094197
$μ_{2}$	2.0	2.0006381	-10332.56	0.0006381
$σ_{1}$	1.0	1.0457653	-10332.56	0.0457653
$σ_{2}$	1.0	1.0033373	-10332.56	0.0033373
$p$	0.5	0.4937459	-10332.56	-0.0062541

4 Moments matching

Let’s fix the parameter of the first component, namely $μ_{1}$ and $σ_{1}^{2}$ and a certain probability $p$ . Then let’s compute the sample estimate of the expectation of $X_{t}$ , i.e. $E {X} = \frac{1}{t} \sum_{i = 1}^{t} x_{i} = \hat{μ}$ and the sample variance: $V {X} = \frac{1}{t} \sum_{i = 1}^{t} (x_{i} - \hat{μ})^{2} = {\hat{σ}}^{2}$ In order to obtain an estimate of the second distribution such that the Gaussian Mixture moments match exactly the sample estimates we solve the system for $μ_{2}$ and $σ_{2}^{2}$ : ${\begin{cases} \hat{μ} = p μ_{1} + (1 - p) μ_{2} \\ {\hat{σ}}^{2} = p (1 - p) (μ_{1} - μ_{2})^{2} + σ_{1}^{2} p + σ_{2}^{2} (1 - p) \end{cases}$ which lead to a unique solution, i.e. $\begin{aligned} μ_{2} = \frac{\hat{μ} - p μ_{1}}{1 - p} \\ σ_{2}^{2} = \frac{{\hat{σ}}^{2} - p σ_{1}^{2}}{1 - p} - p (μ_{1} - μ_{2})^{2} \end{aligned}$

5 Moment generating function

The moment generating function of a Gaussian mixture random variable (Equation 1) reads: $M_{X} (t) = p \cdot \exp {μ_{1} t + \frac{t^{2} σ_{1}^{2}}{2}} + (1 - p) \cdot \exp {μ_{2} t + \frac{t^{2} σ_{2}^{2}}{2}}$

Moment generating function

Proof. Applying the definition of moment generating function on a gaussian mixture we obtain: $\begin{aligned} M_{X} (t) & = E {e^{t X}} = \\ = p \cdot E {e^{t Z_{1}}} + (1 - p) \cdot E {e^{t Z_{2}}} = \\ = p \cdot M_{Z_{1}} (t) + (1 - p) \cdot M_{Z_{1}} (t) = \\ = p \cdot \exp {μ_{1} t + \frac{t^{2} σ_{1}^{2}}{2}} + (1 - p) \cdot \exp {μ_{2} t + \frac{t^{2} σ_{2}^{2}}{2}} \end{aligned}$ where $M_{Z_{1}} (t)$ and $M_{Z_{2}} (t)$ are the moment generating functions of a Gaussian random variable.

6 Esscher transform

The Esscher transform of a Gaussian mixture random variable reads: $E_{θ} {f_{X} (x)} = \tilde{p} \cdot N (μ_{1} + θ σ_{1}^{2}, σ_{1}^{2}) + (1 - \tilde{p}) \cdot N (μ_{2} + θ σ_{2}^{2}, σ_{2}^{2})$ where $\tilde{p}$ are defined as: $\tilde{p} = \frac{p M_{Z_{1}} (θ)}{p M_{Z_{1}} (θ) + (1 - p) M_{Z_{2}} (θ)}$

Moment generating function

Proof. In general, the Esscher transform of a density function is computed as:
$E_{θ} {f_{X} (x)} = \frac{e^{θ X} f_{X} (x)}{M_{X} (θ)} = \frac{e^{θ X} f_{X} (x)}{\int_{- \infty}^{\infty} e^{θ X} f_{X} (x) d x}$ Substituing the density function of a Gaussian mixture we obtain: $E_{θ} {f_{X} (x)} = \frac{e^{θ X} (p f_{Z_{1}} (x) + (1 - p) f_{Z_{2}} (x))}{\int_{- \infty}^{\infty} e^{θ X} (p f_{Z_{1}} (x) + (1 - p) f_{Z_{2}} (x)) d x}$ Let’s examine the product of the first component: $p e^{θ X} f_{Z_{1}} (x) = \frac{p e^{θ x}}{\sqrt{2 π} σ_{1}} \exp {- \frac{(x - μ_{1})^{2}}{2 σ_{1}^{2}}} = \frac{1}{\sqrt{2 π} σ_{1}^{2}} \exp {- \frac{(x - μ_{1})^{2}}{2 σ_{1}^{2}} + θ x}$ In order to rewrite it as the pdf of a normal with mean $(μ_{1} + θ σ_{1}^{2})$ , we note that the expansion of such product reads: $(x - μ_{1} - θ σ_{1}^{2})^{2} = x^{2} - 2 x μ_{1} - 2 θ x σ_{1}^{2} + μ^{2} + 2 μ θ σ_{1}^{2} + θ^{2} σ_{1}^{2}$ Hence, let’s add and subtract inside the exponential $\pm μ_{1} θ \pm \frac{θ^{2} σ_{1}^{2}}{2}$ , i.e. $\begin{aligned} p e^{θ X} f_{Z_{1}} (x) & = \frac{p}{\sqrt{2 π} σ_{1}} \exp {- \frac{(x - μ_{1})^{2}}{2 σ_{1}^{2}} + θ x - μ_{1} θ - \frac{θ^{2} σ_{1}^{2}}{2}} \exp {μ_{1} θ + \frac{θ^{2} σ_{1}^{2}}{2}} = \\ = \frac{p}{\sqrt{2 π} σ_{1}} \exp {μ_{1} θ + \frac{θ σ_{1}^{2}}{2}} \exp {- \frac{(x - μ_{1} - θ^{2} σ_{1}^{2})^{2}}{2 σ_{1}^{2}}} \end{aligned}$ Hence let’s collect the first part and the denominator (mgf of the Gaussian mixture in $θ$ ) and define the new probability: $\tilde{p} = \frac{p \exp {μ_{1} θ + \frac{θ^{2} σ_{1}^{2}}{2}}}{p \exp {μ_{1} θ + \frac{θ^{2} σ_{1}^{2}}{2}} + (1 - p) \exp {μ_{2} θ + \frac{θ^{2} σ_{2}^{2}}{2}}}$ That can be equivalently written in terms of the moment generating functions, i.e. $\tilde{p} = \frac{p M_{Z_{1}} (θ)}{p M_{Z_{1}} (θ) + (1 - p) M_{Z_{2}} (θ)}$ Hence, repeating the same for the second component we obtain that the Esscher transform of a Gaussian mixture is defined as: $E_{θ} {f_{X} (x)} = \tilde{p} \cdot N (μ_{1} + θ σ_{1}^{2}, σ_{1}^{2}) + (1 - \tilde{p}) \cdot N (μ_{2} + θ σ_{2}^{2}, σ_{2}^{2})$

Citation

BibTeX citation:

@online{sartini2024,
  author = {Sartini, Beniamino},
  title = {Gaussian Mixture},
  date = {2024-05-01},
  url = {https://greenfin.it/statistics/distributions/gaussian-mixture.html},
  langid = {en}
}

For attribution, please cite this work as:

Sartini, Beniamino. 2024. “Gaussian Mixture.” May 1, 2024. https://greenfin.it/statistics/distributions/gaussian-mixture.html.