Lecture 5. A) Families of Distributions

From Significant Statistics
Jump to navigation Jump to search
.

Parametric Families of Distributions

When working in statistics, it is often useful to draw conclusions that apply to multiple distributions. We now define classes of distributions, often referred to as families.

Exponential Family

The set of pmfs/pdfs [math]\left\{ f\left(\left.\cdot\right|\theta\right):\theta\in\Theta\right\}[/math] is the exponential family if [math]f\left(\left.x\right|\theta\right)=h\left(x\right)c\left(\theta\right)\exp\left[\sum_{i=1}^{K}\omega_{i}\left(\theta\right)t_{i}\left(x\right)\right],\,x\in\mathbb{R},\theta\in\Theta[/math]

where

[math]h:\mathbb{R}\rightarrow\mathbb{R}_{+},c:\Theta\rightarrow\mathbb{R}_{++},\omega_{i}:\Theta\rightarrow\mathbb{R}\,\forall i,t_{i}:\mathbb{R}\rightarrow\mathbb{R}\,\forall i[/math] and some [math]K\geq1[/math].

Normal Distribution

The normal distribution is part of the exponential family, as we now show:

[math]\begin{aligned} f\left(\left.x\right|\mu,\sigma^{2}\right) & =\frac{1}{\sqrt{2\pi\sigma^{2}}}\exp\left(-\frac{\left(x-\mu\right)^{2}}{2\sigma^{2}}\right)\\ & \frac{1}{\sqrt{2\pi\sigma^{2}}}\exp\left(-\frac{1}{2\sigma^{2}}\left(x^{2}+\mu^{2}-2\mu x\right)\right)\\ & =\underset{h\left(x\right)}{\underbrace{1}}.\underset{c\left(\mu,\sigma^{2}\right)}{\underbrace{\frac{1}{\sqrt{2\pi\sigma^{2}}}\exp\left(-\frac{\mu^{2}}{2\sigma^{2}}\right)}}\underset{\exp\left[\sum_{i=1}^{K}\omega_{i}\left(\theta\right)t_{i}\left(x\right)\right]}{\underbrace{\exp\left(-\frac{x^{2}}{2\sigma^{2}}+\frac{\mu}{\sigma^{2}}x\right)}}\end{aligned}[/math]

where

[math]\omega_{1}\left(\mu,\sigma^{2}\right)=-\frac{1}{2\sigma^{2}};t_{1}\left(x\right)=x^{2};\omega_{2}\left(\mu,\sigma^{2}\right)=\frac{\mu}{\sigma^{2}};t_{2}\left(x\right)=x[/math].

Bernoulli Distribution

The Bernoulli also belongs to the exponential family(!):

[math]\begin{aligned} f\left(\left.x\right|p\right) & =\begin{cases} p, & x=1\\ 1-p, & x=0\\ 0, & otherwise \end{cases}\\ & =\begin{cases} p^{x}\left(1-p\right)^{x}, & x\in\left\{ 0,1\right\} \\ 0, & otherwise \end{cases}\\ & =1\left(x\in\left\{ 0,1\right\} \right)p^{x}\left(1-p\right)^{1-x}\end{aligned}[/math] where [math]1\left(\cdot\right)[/math] is the indicator function.

Factorization yields:

[math]\begin{aligned} f\left(\left.x\right|p\right) & =1\left(x\in\left\{ 0,1\right\} \right)p^{x}\left(1-p\right)^{1-x}\\ & =1\left(x\in\left\{ 0,1\right\} \right)p^{x}\left(1-p\right)\left(1-p\right)^{-x}\\ & =1\left(x\in\left\{ 0,1\right\} \right)\left(1-p\right)\left(\frac{p}{1-p}\right)^{x}\\ & =\underset{h\left(x\right)}{\underbrace{1\left(x\in\left\{ 0,1\right\} \right)}}\underset{c\left(p\right)}{\underbrace{\left(1-p\right)}}\exp\left(\underset{\omega_{1}}{\underbrace{\log\left(\frac{p}{1-p}\right)}}\underset{t_{1}}{\underbrace{x}}\right)\end{aligned}[/math]

Remarks

  • Factor [math]c\left(\theta\right)[/math] is the normalizing constant of the pmf/pdf. This means that it can always be obtained, since it is there to ensure that the functions add up to 1.
  • The support of pmfs/pdfs of members of the exponential family does not depend on [math]\theta[/math], i.e., [math]S_{X}=\left\{ x\in\mathbb{R}:f\left(\left.x\right|\theta\right)\gt 0\right\} =\left\{ x\in\mathbb{R}:h\left(x\right)\gt 0\right\}[/math]. Otherwise, it is impossible to produce [math]h\left(x\right)c\left(\theta\right)[/math]. For example, the uniform distribution with pdf [math]f_{X}\left(\left.x\right|a,b\right)=\frac{1}{b-a}1\left(a\leq x\leq b\right)[/math] does not belong to the exponential family, since it is impossible to separate [math]a[/math] and [math]b[/math] from [math]x[/math] in the indicator function.

Location-Scale Family

A (parametric) family [math]\mathcal{F}[/math] of pdfs is a location-scale family is given by

[math]\mathcal{F}=\left\{ \frac{1}{\sigma}f\left(\frac{\cdot-\mu}{\sigma}\right):\mu\in\mathbb{R},\sigma\gt 0\right\}[/math]

where [math]f\left(\cdot\right)[/math] is the standard pdf of the family, [math]\mu[/math] is the location parameter and [math]\sigma[/math] is the scale parameter. The idea is that [math]\frac{1}{\sigma}f\left(\frac{\cdot-\mu}{\sigma}\right)[/math] is the pdf of [math]\mu+\sigma\widetilde{X}[/math] where [math]\widetilde{X}[/math] has pdf [math]f\left(\cdot\right)[/math].

Clearly, r.v.s with pdf [math]N\left(\mu,\sigma^{2}\right)[/math] belong to the location-scale family of [math]N\left(0,1\right)[/math]. Similarly, r.v.s with pdf [math]U\left(a,b\right)[/math] belong to the location-scale family of [math]U\left(0,1\right)[/math].

Functions that differ from the standard pdf only in their location or scale parameter belong to the pdf’s location and scale family, respectively.