Lecture 5. A) Families of Distributions
Contents
Parametric Families of Distributions
When working in statistics, it is often useful to draw conclusions that apply to multiple distributions. We now define classes of distributions, often referred to as families.
Exponential Family
The set of pmfs/pdfs [math]\left\{ f\left(\left.\cdot\right|\theta\right):\theta\in\Theta\right\}[/math] is the exponential family if [math]f\left(\left.x\right|\theta\right)=h\left(x\right)c\left(\theta\right)\exp\left[\sum_{i=1}^{K}\omega_{i}\left(\theta\right)t_{i}\left(x\right)\right],\,x\in\mathbb{R},\theta\in\Theta[/math]
where
[math]h:\mathbb{R}\rightarrow\mathbb{R}_{+},c:\Theta\rightarrow\mathbb{R}_{++},\omega_{i}:\Theta\rightarrow\mathbb{R}\,\forall i,t_{i}:\mathbb{R}\rightarrow\mathbb{R}\,\forall i[/math] and some [math]K\geq1[/math].
Normal Distribution
The normal distribution is part of the exponential family, as we now show:
[math]\begin{aligned} f\left(\left.x\right|\mu,\sigma^{2}\right) & =\frac{1}{\sqrt{2\pi\sigma^{2}}}\exp\left(-\frac{\left(x-\mu\right)^{2}}{2\sigma^{2}}\right)\\ & \frac{1}{\sqrt{2\pi\sigma^{2}}}\exp\left(-\frac{1}{2\sigma^{2}}\left(x^{2}+\mu^{2}-2\mu x\right)\right)\\ & =\underset{h\left(x\right)}{\underbrace{1}}.\underset{c\left(\mu,\sigma^{2}\right)}{\underbrace{\frac{1}{\sqrt{2\pi\sigma^{2}}}\exp\left(-\frac{\mu^{2}}{2\sigma^{2}}\right)}}\underset{\exp\left[\sum_{i=1}^{K}\omega_{i}\left(\theta\right)t_{i}\left(x\right)\right]}{\underbrace{\exp\left(-\frac{x^{2}}{2\sigma^{2}}+\frac{\mu}{\sigma^{2}}x\right)}}\end{aligned}[/math]
where
[math]\omega_{1}\left(\mu,\sigma^{2}\right)=-\frac{1}{2\sigma^{2}};t_{1}\left(x\right)=x^{2};\omega_{2}\left(\mu,\sigma^{2}\right)=\frac{\mu}{\sigma^{2}};t_{2}\left(x\right)=x[/math].
Bernoulli Distribution
The Bernoulli also belongs to the exponential family(!):
[math]\begin{aligned} f\left(\left.x\right|p\right) & =\begin{cases} p, & x=1\\ 1-p, & x=0\\ 0, & otherwise \end{cases}\\ & =\begin{cases} p^{x}\left(1-p\right)^{x}, & x\in\left\{ 0,1\right\} \\ 0, & otherwise \end{cases}\\ & =1\left(x\in\left\{ 0,1\right\} \right)p^{x}\left(1-p\right)^{1-x}\end{aligned}[/math] where [math]1\left(\cdot\right)[/math] is the indicator function.
Factorization yields:
[math]\begin{aligned} f\left(\left.x\right|p\right) & =1\left(x\in\left\{ 0,1\right\} \right)p^{x}\left(1-p\right)^{1-x}\\ & =1\left(x\in\left\{ 0,1\right\} \right)p^{x}\left(1-p\right)\left(1-p\right)^{-x}\\ & =1\left(x\in\left\{ 0,1\right\} \right)\left(1-p\right)\left(\frac{p}{1-p}\right)^{x}\\ & =\underset{h\left(x\right)}{\underbrace{1\left(x\in\left\{ 0,1\right\} \right)}}\underset{c\left(p\right)}{\underbrace{\left(1-p\right)}}\exp\left(\underset{\omega_{1}}{\underbrace{\log\left(\frac{p}{1-p}\right)}}\underset{t_{1}}{\underbrace{x}}\right)\end{aligned}[/math]
Remarks
- Factor [math]c\left(\theta\right)[/math] is the normalizing constant of the pmf/pdf. This means that it can always be obtained, since it is there to ensure that the functions add up to 1.
- The support of pmfs/pdfs of members of the exponential family does not depend on [math]\theta[/math], i.e., [math]S_{X}=\left\{ x\in\mathbb{R}:f\left(\left.x\right|\theta\right)\gt 0\right\} =\left\{ x\in\mathbb{R}:h\left(x\right)\gt 0\right\}[/math]. Otherwise, it is impossible to produce [math]h\left(x\right)c\left(\theta\right)[/math]. For example, the uniform distribution with pdf [math]f_{X}\left(\left.x\right|a,b\right)=\frac{1}{b-a}1\left(a\leq x\leq b\right)[/math] does not belong to the exponential family, since it is impossible to separate [math]a[/math] and [math]b[/math] from [math]x[/math] in the indicator function.
Location-Scale Family
A (parametric) family [math]\mathcal{F}[/math] of pdfs is a location-scale family is given by
[math]\mathcal{F}=\left\{ \frac{1}{\sigma}f\left(\frac{\cdot-\mu}{\sigma}\right):\mu\in\mathbb{R},\sigma\gt 0\right\}[/math]
where [math]f\left(\cdot\right)[/math] is the standard pdf of the family, [math]\mu[/math] is the location parameter and [math]\sigma[/math] is the scale parameter. The idea is that [math]\frac{1}{\sigma}f\left(\frac{\cdot-\mu}{\sigma}\right)[/math] is the pdf of [math]\mu+\sigma\widetilde{X}[/math] where [math]\widetilde{X}[/math] has pdf [math]f\left(\cdot\right)[/math].
Clearly, r.v.s with pdf [math]N\left(\mu,\sigma^{2}\right)[/math] belong to the location-scale family of [math]N\left(0,1\right)[/math]. Similarly, r.v.s with pdf [math]U\left(a,b\right)[/math] belong to the location-scale family of [math]U\left(0,1\right)[/math].
Functions that differ from the standard pdf only in their location or scale parameter belong to the pdf’s location and scale family, respectively.