Gaussian mixture
Let’s consider a linear combination of a Bernoulli and two normal random variables, all assumed to be independent, i.e. \[ X \sim B \cdot Z_1 + (1-B) \cdot Z_2 \text{,} \tag{1}\] or in compact form \(X \sim GM(\mu_1, \mu_2, \sigma_1^2,\sigma_2^2, p)\). Formally, \(B \sim \text{Ber}(p)\) is a Bernoulli while \(Z_1 = \mathcal{N}(\mu_1, \sigma_1^2)\) and \(Z_2 = \mathcal{N}(\mu_2, \sigma_2^2)\) are two independent Gaussian random variables.
Gaussian mixture simulation
# ================== Setups ==================
t_bar <- 5000 # number of steps ahead
# parameters
par <- c(mu1=-2,mu2=2,sd1=1,sd2=1,p=0.5)
# ============================================
# Gaussian mixture simulation
N_1 <- rnorm(t_bar, mean = par[1], sd = par[3])
N_2 <- rnorm(t_bar, mean = par[2], sd = par[4])
B <- rbinom(t_bar, 1, prob = par[5])
Xt <- B*N_1 + (1 - B)*N_2
# Empiric pdf and cdf
ker <- density(Xt, from = min(Xt), to = max(Xt))
ker$cdf_emp <- cumsum(ker$y/sum(ker$y))
# Components normal pdf
ker$pdf_Z1 <- dnorm(ker$x, mean = par[1], sd = par[3])
ker$pdf_Z2 <- dnorm(ker$x, mean = par[2], sd = par[4])
# Mixture pdf and cdf
ker$pdf <- par[5]*ker$pdf_Z1 + (1-par[5])*ker$pdf_Z2
ker$cdf <- cumsum(ker$pdf/sum(ker$pdf))
# =================== Plot ===================
# Plot trajectory
plot_gm <- ggplot()+
geom_point(aes(1:t_bar, Xt), alpha = exp(-0.00009*t_bar))+
labs(x = "t", y = TeX("$X_t$"))+
theme_bw()
# Plot pdf
plot_pdf <- ggplot()+
geom_line(aes(ker$x, ker$y))+
geom_line(aes(ker$x, ker$pdf), color = "red")+
labs(x = NULL, y = "Pdf")+
theme_bw()+
coord_flip()
# Plot cdf
plot_cdf <- ggplot()+
geom_line(aes(ker$x, ker$cdf_emp))+
geom_line(aes(ker$x, ker$cdf), color = "red")+
labs(x = NULL, y = "Cdf")+
theme_bw()+
coord_flip()
plot_gm
gridExtra::grid.arrange(plot_pdf, plot_cdf, ncol = 2)
1 Distribution and density
The distribution function of a Gaussian mixture reads explicitely as: \[ F_{X}(x) = p \cdot \Phi\left(\frac{x - \mu_1}{\sigma_1}\right) + (1-p) \cdot \Phi\left(\frac{x - \mu_2}{\sigma_2}\right) \text{,} \tag{2}\] where \(\Phi\) is the cumulative distribution function of a standard normal random variable. Taking the derivative, it can be easily shown that the density function reads: \[ f_{X}(x) = p \cdot \phi\left(\frac{x - \mu_1}{\sigma_1}\right) + (1-p) \cdot \phi\left(\frac{x - \mu_2}{\sigma_2}\right) \text{.} \tag{3}\] where \(\phi\) is the density function of a standard normal random variable.
dnorm_mix <- function(params) {
# parameters
mu1 = params[1]
mu2 = params[2]
sd1 = params[3]
sd2 = params[4]
p = params[5]
function(x, log = FALSE){
probs <- p*stats::dnorm(x, mean = mu1, sd = sd1) + (1-p)*stats::dnorm(x, mean = mu2, sd = sd2)
if (log) {
probs <- base::log(probs)
}
return(probs)
}
}
Proof. The distribution function of a Gaussian Mixture is defined as: \[ F_{X}(y) = \mathbb{P}(X \le y) = \mathbb{E}\{\mathbb{1}_{X \le x}\} \] Hence, we can rewrite it in terms of conditional expectation with respect to \(B\), i.e. \[ \begin{aligned} F_{X}(y) & {} = \mathbb{E}\{\mathbb{1}_{X \le x}|B\} = \\ & = \mathbb{E}\{\mathbb{1}_{X \le x}|B = 0\}\mathbb{P}(B = 0) + \mathbb{E}\{\mathbb{1}_{X \le x}|B = 1\}\mathbb{P}(B = 1) = \\ & = p \cdot \mathbb{P}(Z_1 \le x) + (1-p) \cdot \mathbb{P}(Z_2 \le x) \end{aligned} \] Hence, standardizing the normal random variable we obtain, \[ F_{X}(x) = p \cdot \Phi\left(\frac{x - \mu_1}{\sigma_1}\right) + (1-p) \cdot \Phi\left(\frac{x - \mu_2}{\sigma_2}\right) \text{.} \] where \(\Phi\) denotes the distribution function of a standard normal. Knowing that \(f_X(x) = \frac{dF_{X}(x)}{dx}\) and that \(\phi_X(x) = \frac{d\Phi(x)}{dx}\), where \(\phi\) is the density function of a standard normal we obtain the result, i.e. \[ f_{X}(x) = p \cdot \phi\left(\frac{x - \mu_1}{\sigma_1}\right) + (1-p) \cdot \phi\left(\frac{x - \mu_2}{\sigma_2}\right) \text{.} \]
2 Moments
Given that \(Z_1\), \(Z_2\) and \(B\) are independent, the expectation can be computed as: \[ \mathbb{E}\{X\} = p \mu_1 + (1-p) \mu_2 \] The second moment is computed as: \[ \mathbb{E}\{X^2\} = p (\mu_1^2 + \sigma_1^2) + (1-p) (\mu_2^2 + \sigma_2^2) \] Hence, the variance reads: \[ \begin{aligned} \mathbb{V}\{X\} = p(1-p)(\mu_1 - \mu_2)^2 + \sigma_1^2 p + \sigma_2^2 (1-p) \end{aligned} \]
Proof. Given that \(Z_1\), \(Z_2\) and \(B\) are independent, the expectation can be computed as: \[ \begin{aligned} \mathbb{E}\{X\} {} & = \mathbb{E}\{\mathbb{E}\{X|B\}\} = \\ & = \mathbb{E}\{X|B = 1\}\mathbb{P}(B=1) + \mathbb{E}\{X|B = 0\}\mathbb{P}(B=0) = \\ & = p \mathbb{E}\{Z_1\} + (1-p) \mathbb{E}\{Z_2\} = \\ & = p \mu_1 + (1-p) \mu_2 \end{aligned} \] The second moment is computed similarly to the first one, i.e. \[ \begin{aligned} \mathbb{E}\{X^2\} {} & = \mathbb{E}\{\mathbb{E}\{X^2|B\}\} = \\ & = \mathbb{E}\{X^2|B = 1\}\mathbb{P}(B=1) + \mathbb{E}\{X^2|B = 0\}\mathbb{P}(B=0) = \\ & = \mathbb{E}\{B\} \mathbb{E}\{Z_1^2\} + \mathbb{E}\{1-B\} \mathbb{E}\{Z_2^2\} = \\ & = p \mathbb{E}\{Z_1^2\} + (1-p) \mathbb{E}\{Z_2^2\} = \\ & = p (\mu_1^2 + \sigma_1^2) + (1-p) (\mu_2^2 + \sigma_2^2) \end{aligned} \] The variance, by definition, is given by: \[ \mathbb{V}\{X\} = \mathbb{E}\{X^2\} - \mathbb{E}\{X\}^2 \] where the first moment squared is \[ \begin{aligned} \mathbb{E}\{X\}^2 {} & = \left[p \mu_1 + (1-p) \mu_2\right]^2 = \\ & = p^2 \mu_1^2 + (1-p)^2 \mu_2^2 + 2p(1-p)\mu_1\mu_2 \end{aligned} \] Hence the variance, \[ \begin{aligned} \mathbb{V}\{X\} {} & = p (\mu_1^2 + \sigma_1^2) + (1-p) (\mu_2^2 + \sigma_2^2) - p^2 \mu_1^2 - (1-p)^2 \mu_2^2 - 2p(1-p)\mu_1\mu_2 = \\ & = {\color{red}{p \mu_1^2}} + {\color{orange}{p \sigma_1^2}} + {\color{darkgreen}{\mu_2^2}} + {\color{orange}{\sigma_2^2}} - {\color{darkgreen}{p\mu_2^2}} - {\color{orange}{p\sigma_2^2}} - {\color{red}{p^2 \mu_1^2}} - {\color{darkgreen}{(1-p)^2\mu_2^2}} - 2p(1-p)\mu_1 \mu_2 = \\ & = {\color{red}{\mu_1^2 p (1-p)}} + {\color{orange}{p \sigma_1^2 + (1-p)\sigma_2^2}} + {\color{darkgreen}{p(1-p) \mu_2^2}} - 2p(1-p)\mu_1 \mu_2 = \\ & = p(1-p)(\mu_1^2 - \mu_2^2 - 2\mu_1\mu_2) + p \sigma_1^2 + (1-p) \sigma_2^2 = \\ & = p(1-p)(\mu_1 - \mu_2)^2 + p \sigma_1^2 + (1-p) \sigma_2^2 \end{aligned} \]
3 Maximum likelihood
Minimizing the negative log-likelihood gives an estimate of the parameters, i.e. \[ \underset{\mu_1, \mu_2, \sigma_1, \sigma_2, p}{\text{argmin}} \left\{\sum_{i = 1}^{t} \log (f_{X}(x_i)) \right\} \text{,} \] or equivalently maximizing the negative log-likelihood, i.e. \[ \underset{\mu_1, \mu_2, \sigma_1, \sigma_2, p}{\text{argmax}} \left\{-\sum_{i = 1}^{t} \log (f_{X}(x_i)) \right\} \text{.} \]
Maximum likelihood for Gaussian mixture
# Initialize parameters
init_params <- par*runif(5, 0.3, 1.1)
# Log-likelihood function
log_lik <- function(params, x){
# Parameters
mu1 = params[1]
mu2 = params[2]
sd1 = params[3]
sd2 = params[4]
p = params[5]
# Ensure that probability is in (0,1)
if(p > 0.99 | p < 0.01 | sd1 < 0 | sd2 < 0){ s
return(NA_integer_)
}
# Mixture density
pdf_mix <- dnorm_mix(params)
# Log-likelihood
loss <- sum(pdf_mix(x, log = TRUE), na.rm = TRUE)
return(loss)
}
# Optimal parameters
# fnscale = -1 to maximize (or use negative likelihood)
ml_estimate <- optim(par = init_params, log_lik,
x = Xt, control = list(maxit = 500000, fnscale = -1))
Parameter | True | Estimate | Log-lik | Bias |
---|---|---|---|---|
$$\mu_1$$ | -2.0 | -1.9905803 | -10332.56 | 0.0094197 |
$$\mu_2$$ | 2.0 | 2.0006381 | -10332.56 | 0.0006381 |
$$\sigma_1$$ | 1.0 | 1.0457653 | -10332.56 | 0.0457653 |
$$\sigma_2$$ | 1.0 | 1.0033373 | -10332.56 | 0.0033373 |
$$p$$ | 0.5 | 0.4937459 | -10332.56 | -0.0062541 |
4 Moments matching
Let’s fix the parameter of the first component, namely \(\mu_1\) and \(\sigma_1^2\) and a certain probability \(p\). Then let’s compute the sample estimate of the expectation of \(X_t\), i.e. \[ \mathbb{E}\{X\} = \frac{1}{t}\sum_{i = 1}^{t} x_i = \hat{\mu} \] and the sample variance: \[ \mathbb{V}\{X\} = \frac{1}{t}\sum_{i = 1}^{t} (x_i - \hat{\mu})^2 = \hat{\sigma}^2 \] In order to obtain an estimate of the second distribution such that the Gaussian Mixture moments match exactly the sample estimates we solve the system for \(\mu_2\) and \(\sigma_2^2\): \[ \begin{cases} \hat{\mu} = p \mu_1 + (1-p) \mu_2 \\ \hat{\sigma}^2 = p(1-p)(\mu_1 - \mu_2)^2 + \sigma_1^2 p + \sigma_2^2 (1-p) \end{cases} \] which lead to a unique solution, i.e. \[ \begin{aligned} {} & \mu_2 = \frac{\hat{\mu} - p \mu_1}{1-p} \\ & \sigma_2^2 = \frac{\hat{\sigma}^2 - p\sigma_1^2 }{1-p} - p (\mu_1 - \mu_2)^2 \end{aligned} \]
5 Moment generating function
The moment generating function of a Gaussian mixture random variable (Equation 1) reads: \[ M_X(t) = p \cdot \exp\left\{\mu_1 t + \frac{t^2 \sigma_1^2}{2}\right\} + (1-p) \cdot \exp\left\{\mu_2 t + \frac{t^2 \sigma_2^2}{2}\right\} \]
Proof. Applying the definition of moment generating function on a gaussian mixture we obtain: \[ \begin{aligned} M_X(t) & {} = \mathbb{E}\{e^{tX}\} = \\ & = p \cdot \mathbb{E}\{e^{t Z_1}\} + (1-p) \cdot \mathbb{E}\{e^{t Z_2}\} = \\ & = p \cdot M_{Z_1}(t) + (1-p) \cdot M_{Z_1}(t) = \\ & = p \cdot \exp\left\{\mu_1 t + \frac{t^2 \sigma_1^2}{2}\right\} + (1-p) \cdot \exp\left\{\mu_2 t + \frac{t^2 \sigma_2^2}{2}\right\} \end{aligned} \] where \(M_{Z_1}(t)\) and \(M_{Z_2}(t)\) are the moment generating functions of a Gaussian random variable.
6 Esscher transform
The Esscher transform of a Gaussian mixture random variable reads: \[ E_{\theta}\{f_X(x)\} = \tilde{p} \cdot \mathcal{N}(\mu_1 + \theta \sigma_1^2, \sigma_1^2) + (1-\tilde{p}) \cdot \mathcal{N}(\mu_2 + \theta \sigma_2^2, \sigma_2^2) \] where \(\tilde{p}\) are defined as: \[ \tilde{p} = \frac{p M_{Z_1}(\theta)}{p M_{Z_1}(\theta) + (1-p) M_{Z_2}(\theta)} \]
Proof. In general, the Esscher transform of a density function is computed as:
\[
E_{\theta}\{f_X(x)\} = \frac{e^{\theta X} f_X(x)}{M_X(\theta)} = \frac{e^{\theta X} f_X(x)}{\int_{-\infty}^{\infty} e^{\theta X} f_X(x) dx}
\] Substituing the density function of a Gaussian mixture we obtain: \[
E_{\theta}\{f_X(x)\} = \frac{e^{\theta X} (p f_{Z_1}(x) + (1-p) f_{Z_2}(x))}{\int_{-\infty}^{\infty} e^{\theta X} (p f_{Z_1}(x) + (1-p) f_{Z_2}(x)) dx}
\] Let’s examine the product of the first component: \[
p e^{\theta X} f_{Z_1}(x) = \frac{p e^{\theta x}}{\sqrt{2\pi} \sigma_1} \exp\left\{-\frac{(x-\mu_1)^2}{2\sigma_1^2}\right\} = \frac{1}{\sqrt{2\pi} \sigma_1^2} \exp\left\{-\frac{(x-\mu_1)^2}{2\sigma_1^2} + \theta x \right\}
\] In order to rewrite it as the pdf of a normal with mean \((\mu_1 + \theta \sigma_1^2)\), we note that the expansion of such product reads: \[
(x - \mu_1 - \theta \sigma_1^2)^2 = x^2 - 2 x\mu_1 - 2 \theta x \sigma_1^2 + \mu^2 + 2 \mu \theta \sigma_1^2 + \theta^2 \sigma_1^2
\] Hence, let’s add and subtract inside the exponential \(\pm \mu_1 \theta \pm \frac{\theta^2 \sigma_1^2}{2}\), i.e. \[
\begin{aligned}
p e^{\theta X} f_{Z_1}(x) & {} = \frac{p}{\sqrt{2\pi} \sigma_1} \exp\left\{-\frac{(x-\mu_1)^2}{2\sigma_1^2} + \theta x - \mu_1 \theta - \frac{\theta^2 \sigma_1^2}{2} \right\} \exp\left\{\mu_1 \theta + \frac{\theta^2 \sigma_1^2}{2}\right\} = \\
& = \frac{p}{\sqrt{2\pi} \sigma_1} \exp\left\{\mu_1 \theta + \frac{\theta \sigma_1^2}{2}\right\} \exp\left\{-\frac{(x-\mu_1 - \theta^2 \sigma_1^2 )^2}{2\sigma_1^2} \right\}
\end{aligned}
\] Hence let’s collect the first part and the denominator (mgf of the Gaussian mixture in \(\theta\)) and define the new probability: \[
\tilde{p} = \frac{p \exp\left\{\mu_1 \theta + \frac{\theta^2 \sigma_1^2}{2}\right\}}{p \exp\left\{\mu_1 \theta + \frac{\theta^2 \sigma_1^2}{2}\right\} + (1-p) \exp\left\{\mu_2 \theta + \frac{\theta^2 \sigma_2^2}{2}\right\}}
\] That can be equivalently written in terms of the moment generating functions, i.e. \[
\tilde{p}= \frac{p M_{Z_1}(\theta)}{p M_{Z_1}(\theta) + (1-p) M_{Z_2}(\theta)}
\] Hence, repeating the same for the second component we obtain that the Esscher transform of a Gaussian mixture is defined as: \[
E_{\theta}\{f_X(x)\} = \tilde{p} \cdot \mathcal{N}(\mu_1 + \theta \sigma_1^2, \sigma_1^2) + (1-\tilde{p}) \cdot \mathcal{N}(\mu_2 + \theta \sigma_2^2, \sigma_2^2)
\]
Citation
@online{sartini2024,
author = {Sartini, Beniamino},
title = {Gaussian Mixture},
date = {2024-05-01},
url = {https://greenfin.it/statistics/distributions/gaussian-mixture.html},
langid = {en}
}