Full Lecture 4

From Significant Statistics
Jump to navigation Jump to search

There exist many distributions. You can create your own right now. However, over time, some distributions have revealed themselves as particularly useful, and so it's good to keep track of them. We only discuss univariate distributions in this lecture. We start by presenting a few discrete distributions, and then discuss three continuous distributions.



Bernoulli Distribution

An r.v. [math]X[/math] has a Bernoulli distribution with parameter [math]p\in\left[0,1\right][/math] if [math]X[/math] is discrete with p.m.f. [math]f_{X}\left(x\right)=\begin{cases} p, & if\,x=1\\ 1-p, & if\,x=0\\ 0 & otherwise \end{cases}[/math]

The Bernoulli distribution captures the outcome of a binary experiment. For example, one could throw a biased coin, with probability of Heads equal to [math]p[/math].

Mean

[math]E\left(X\right)=\sum_{x\in S}xf_{X}\left(\left.x\right|p\right)=0.f_{X}\left(\left.0\right|p\right)+1.f_{X}\left(\left.1\right|p\right)=p,\,[/math] where [math]S=\left\{ x:f_{X}\left(\left.x\right|p\right)\gt 0\right\}[/math] is the support of [math]X[/math].

Variance

[math]Var\left(X\right)=E\left[\left(X-E\left(X\right)\right)^{2}\right]=E\left(X^{2}\right)-E\left(X\right)^{2}\underset{0^{2}.f_{X}\left(\left.0\right|p\right)+1^{2}.f_{X}\left(\left.1\right|p\right)}{\underbrace{p}-p^{2}}=p\left(1-p\right)[/math]

MGF

[math]\begin{aligned} M_{X}\left(t\right) & =E\left(\exp\left(Xt\right)\right)=\sum_{x\in S}\exp\left(tX\right)f_{X}\left(\left.x\right|p\right)\\ & =\exp\left(t0\right)f_{X}\left(\left.0\right|p\right)+\exp\left(t1\right)f_{X}\left(\left.1\right|p\right)\\ & =1-p+\exp\left(t\right)p\end{aligned}[/math]



Binomial Distribution

An r.v. [math]X[/math] follows a Binomial distribution with parameters [math]n\in\mathbb{N}[/math], [math]p\in\left[0,1\right][/math]if [math]X[/math] is discrete with pmf

[math]f_{X}\left(\left.x\right|n,p\right)=\begin{cases} \left(\begin{array}{c} n\\ x \end{array}\right)p^{x}\left(1-p\right)^{n-x}, & if\,x\in\left\{ 0,1,...,n\right\} \\ 0, & otherwise \end{cases}[/math]

where [math]\left(\begin{array}{c} n\\ x \end{array}\right)[/math] is called the binomial coefficient, and is defined by [math]\left(\begin{array}{c} n\\ x \end{array}\right)=\frac{n!}{x!\left(n-x\right)!}[/math]

The Binomial Distribution characterizes the number of successes in a binary (Bernoulli) experiment repeated [math]n[/math] times. Parameter [math]n[/math] is the number of trials, [math]p[/math] is the probability of success, and [math]x[/math] is the realized number of successes. If [math]X\sim Bin\left(1,p\right)[/math], then [math]X\sim Ber\left(p\right)[/math].

Mean

[math]E\left(X\right)=np[/math]

Variance

[math]Var\left(X\right)=np\left(1-p\right)[/math]

MGF

[math]M_{X}\left(t\right)=\left(1-p+p\exp\left(t\right)\right)^{n}[/math]

Notice that the expressions above are clearly related to their Bernoulli analogues.



Poisson Distribution

An r.v. [math]X[/math] follows a Poisson distribution with parameter [math]\lambda\gt 0[/math], if [math]X[/math] is discrete with pmf [math]f_{X}\left(x\right)=\begin{cases} \exp\left(-\lambda\right)\frac{\lambda^{x}}{x!}, & x\in\mathbb{N}_{0}\\ 0, & otherwise \end{cases}[/math]

The Poisson distribution characterizes a process with constant arrival rate, [math]\lambda[/math] (expressed as number of arrivals per unit of time).

Fun fact: [math]Bin\left(n,p\right)\simeq Pois\left(np\right)[/math] for [math]n[/math] large and [math]np[/math] small.

Mean

[math]E\left(X\right)=\sum_{x=0}^{^{\infty}}xf_{X}\left(x\right)=\sum_{x=0}^{^{\infty}}x\exp\left(-\lambda\right)\frac{\lambda^{x}}{x!}=\sum_{x=1}^{^{\infty}}\exp\left(-\lambda\right)\frac{\lambda^{x}}{\left(x-1\right)!}=\lambda\sum_{x=1}^{^{\infty}}\underset{f_{X}\left(\left.x-1\right|\lambda\right)}{\underbrace{\exp\left(-\lambda\right)\frac{\lambda^{x-1}}{\left(x-1\right)!}}}=\lambda\underset{=1}{\underbrace{\sum_{t=1}^{^{\infty}}f_{X}\left(\left.t\right|\lambda\right)}}=\lambda[/math].

Variance

[math]Var\left(X\right)=\lambda[/math] since [math]\underset{2nd\,factorial\,moment\,X}{\underbrace{E\left(X\left(X-1\right)\right)}}=\sum_{x=0}^{^{\infty}}x\left(x-1\right)\exp\left(-\lambda\right)\frac{\lambda^{x}}{x!}=\sum_{x=2}^{^{\infty}}\exp\left(-\lambda\right)\frac{\lambda^{x}}{\left(x-2\right)!}=\lambda^{2}\sum_{x=2}^{^{\infty}}\exp\left(-\lambda\right)\frac{\lambda^{x-2}}{x!}=\lambda^{2}[/math] and [math]Var\left(X\right)=E\left(X^{2}\right)-E\left(X\right)^{2}=E\left(X\left(X-1\right)\right)+E\left(X\right)-E\left(X\right)^{2}=\lambda[/math].

MGF

[math]M_{X}\left(t\right)=\exp\left(\lambda\left(\exp\left(t\right)-1\right)\right)[/math]



Uniform Distribution on [math]\left[a,b\right][/math]

An r.v. [math]X[/math] follows a uniform distribution [math]U\left(a,b\right)[/math] if [math]X[/math] is continuous with pdf [math]f_{X}\left(X\right)=\begin{cases} \frac{1}{b-a}, & x\in\left[a,b\right]\\ 0, & otherwise \end{cases}[/math]

Under the Uniform distribution, all values in [math]\left[a,b\right][/math] are “equally likely.”

Notice that if [math]X\sim U\left(a,b\right)[/math], then [math]X=\left(b-a\right)\widetilde{X}+a[/math] where [math]\widetilde{X}\sim U\left(0,1\right)[/math], and [math]f_{\widetilde{X}}\left(x\right)=1\left(x\in\left[0,1\right]\right)[/math].

Mean

[math]E\left(\widetilde{X}\right)=\int_{0}^{1}xdx=\frac{1}{2}[/math].

So, [math]E\left(X\right)=E\left(\left(b-a\right)\widetilde{X}+a\right)=\left(b-a\right)E\left(\widetilde{X}\right)+a=\frac{a+b}{2}[/math]

Variance

[math]Var\left(\widetilde{X}\right)=E\left(\widetilde{X}^{2}\right)-E\left(\widetilde{X}\right)^{2}=\int_{0}^{1}x^{2}dx-\left(\frac{1}{2}\right)^{2}=\frac{1}{3}-\frac{1}{4}=\frac{1}{12}[/math].

So, [math]Var\left(X\right)=Var\left(\left(b-a\right)\widetilde{X}+a\right)=\left(b-a\right)^{2}Var\left(\widetilde{X}\right)=\frac{\left(b-a\right)^{2}}{12}[/math].

MGF

[math]M_{X}\left(t\right)=\exp\left(at\right)M_{\widetilde{X}}\left(\left(b-a\right)t\right)=...=\frac{\exp\left(bt\right)-\exp\left(at\right)}{\left(b-a\right)t}[/math]



Gamma Distribution

An r.v. [math]X[/math] follows a Gamma distribution with parameters [math]\alpha,\beta\gt 0[/math] if [math]X[/math] has continuous pdf [math]f_{X}\left(x\right)=\begin{cases} \frac{1}{\Gamma\left(\alpha\right)\beta^{\alpha}}x^{\alpha-1}\exp\left(-\frac{x}{\beta}\right), & x\gt 0\\ 0, & otherwise \end{cases}[/math] where [math]\Gamma\left(x\right)[/math] is the gamma function, and is given by [math]\Gamma\left(x\right)=\int_{0}^{\infty}t^{\alpha-1}\exp\left(-t\right)dt,\,\alpha\gt 0[/math]. The Gamma function is a natural extension of the factorial operation, because [math]\Gamma\left(\alpha+1\right)=\Gamma\left(\alpha\right)[/math] and [math]\Gamma\left(1\right)=1[/math] which implies that [math]\Gamma\left(n\right)=n!\,\forall n\in\mathbb{N}.[/math] The Gamma distribution is especially useful for Bayesian estimation, which we will cover later.

This is a good time to describe a common property of pdfs. Notice that over its support, function [math]\frac{1}{\Gamma\left(\alpha\right)\beta^{\alpha}}x^{\alpha-1}\exp\left(-\frac{x}{\beta}\right)[/math] has some factors that depend on [math]x[/math], and others that do not: [math]f_{X}\left(x\right)=\frac{1}{\Gamma\left(\alpha\right)\beta^{\alpha}}x^{\alpha-1}\exp\left(-\frac{x}{\beta}\right)=\underset{normalizing\,constant}{\underbrace{\frac{1}{\Gamma\left(\alpha\right)\beta^{\alpha}}}}.\underset{kernel\,of\,pdf}{\underbrace{x^{\alpha-1}\exp\left(-\frac{x}{\beta}\right)}}[/math] The normalizing constant does not depend on [math]x[/math]. It is there simply to make sure that the function integrates to one. From this, we immediately learn that [math]\int_{0}^{\infty}x^{\alpha-1}\exp\left(-\frac{x}{\beta}\right)dx=\Gamma\left(\alpha\right)\beta^{\alpha}.[/math]

Mean

Consider first r.v. [math]\widetilde{X}\sim Gam\left(\alpha,1\right)[/math], with the information that [math]X=\beta\widetilde{X}\sim Gam\left(\alpha,\beta\right)[/math]. Then,

[math]E\left(\widetilde{X}\right)=\int_{0}^{\infty}xf_{X}\left(\left.x\right|\alpha,1\right)dx=\int_{0}^{\infty}x\frac{1}{\Gamma\left(\alpha\right)}x^{\alpha-1}\exp\left(-x\right)dx=\frac{1}{\Gamma\left(\alpha\right)}\int_{0}^{\infty}x^{\alpha}\exp\left(-x\right)dx\frac{\Gamma\left(\alpha+1\right)}{\Gamma\left(\alpha+1\right)}=\frac{\Gamma\left(\alpha+1\right)}{\Gamma\left(\alpha\right)}\underset{\int_{0}^{\infty}f_{\widetilde{X}}\left(\left.x\right|\alpha+1,1\right)dx=1}{\underbrace{\int_{0}^{\infty}\frac{1}{\Gamma\left(\alpha+1\right)}x^{\alpha}\exp\left(-x\right)dx}}=\alpha[/math].

So, [math]E\left(X\right)=E\left(\beta\widetilde{X}\right)=\beta E\left(\widetilde{X}\right)=\alpha\beta[/math].

Variance

[math]Var\left(\widetilde{X}\right)=E\left(\widetilde{X}^{2}\right)-E\left(\widetilde{X}\right)^{2}=...=\alpha[/math], so [math]Var\left(X\right)=\alpha\beta^{2}[/math].

MGF

[math]M_{X}\left(t\right)=\left(1-\frac{t}{\beta}\right)^{-\alpha},t\lt \beta[/math]



Normal Distribution

Recall: Random variable [math]X[/math] follows a normal distribution [math]N\left(\mu,\sigma^{2}\right)[/math] if it is continuous with pdf [math]f_{X}\left(\left.x\right|\mu,\sigma^{2}\right)=\frac{1}{\sqrt{2\pi\sigma^{2}}}\exp\left(-\frac{\left(x-\mu\right)^{2}}{2\sigma^{2}}\right),x\in\mathbb{R}[/math]

The normal distribution is by far the most important continuous distribution. Its main claim to fame is that it can be shown (as we will, later) that averages of a large number of random variables, under some conditions, are normally distributed. This result is called the central limit theorem.

Note that if [math]X\sim N\left(\mu,\sigma^{2}\right)[/math], [math]X=\mu+\sigma\widetilde{X}[/math] where [math]\widetilde{X}\sim N\left(0,1\right)[/math].

CDF

The cdf of the normal distribution does not admit a closed-form representation. However, we do use a short hand that relies on the [math]N\left(0,1\right)[/math] distribution:

[math]F_{X}\left(x\right)=P\left(X\leq x\right)=P\left(\mu+\sigma\widetilde{X}\leq x\right)=P\left(\widetilde{X}\leq\frac{x-\mu}{\sigma}\right)=\Phi\left(\frac{x-\mu}{\sigma}\right)[/math], where [math]\Phi\left(\cdot\right)[/math] is the standard normal cdf., i.e.,

[math]\Phi\left(x\right)=\int_{-\infty}^{x}\frac{1}{\sqrt{2\pi}}\exp\left(-\frac{x^{2}}{2}\right)dx,x\in\mathbb{R}[/math].


Dirac delta function

The Dirac delta function is not really a pdf. However, it can sometimes be useful when working with mass points. It is defined as [math]\delta:\mathbb{R}\rightarrow\mathbb{R}\cup\infty[/math], s.t.,

[math]\delta\left(x\right)=\begin{cases} +\infty, & x=0\\ 0, & otherwise \end{cases}[/math]

and

[math]\int_{-\infty}^{\infty}\delta\left(x\right)dx=1[/math]

This is a valid pdf, except it is not a function (its counterdomain includes infinity).

Sifting Property

This property is especially useful. It states:

[math]\int_{-\infty}^{\infty}f\left(x\right)\delta\left(x-\alpha\right)dx=f\left(\alpha\right)1\left(\underline{x}\leq a\leq\overline{x}\right)[/math]

as long as [math]f\left(\cdot\right)[/math] is continuous at [math]a[/math].

Sketch of the Proof

Let

[math]g\left(x\right)=\begin{cases} \frac{1}{2\Delta}, & -\Delta\leq x\leq\Delta\\ 0, & otherwise \end{cases}[/math]

and notice that [math]g\left(x\right)[/math] is a pdf with support [math]\left[-\Delta,\Delta\right][/math].

In addition, notice that [math]\lim_{\Delta\rightarrow0}\,g\left(x\right)=\delta\left(x\right).[/math]

Then,

[math]\begin{aligned} \int_{-\infty}^{\infty}f\left(x\right)\delta\left(x-\alpha\right)dx & =\int_{-\infty}^{\infty}\lim_{\Delta\rightarrow0}\,f\left(x\right)g\left(x-\alpha\right)dx\\ & =\lim_{\Delta\rightarrow0}\int_{-\infty}^{\infty}f\left(x\right)g\left(x-\alpha\right)dx\\ & =\lim_{\Delta\rightarrow0}\,\frac{1}{2\Delta}\int_{-\infty}^{\infty}f\left(x\right)1\left[-\Delta\leq x-\alpha\leq\Delta\right]dx\\ & =\lim_{\Delta\rightarrow0}\,\frac{1}{2\Delta}\int_{\alpha-\Delta}^{\alpha+\Delta}f\left(x\right)dx\\ & =\lim_{\Delta\rightarrow0}\,\frac{F\left(\alpha+\Delta\right)-F\left(\alpha-\Delta\right)}{2\Delta}\\ & =f\left(\alpha\right)\end{aligned}[/math]

Clearly, some conditions are needed for the steps above to be valid. Also, when facing a proper integral, it is possible to show that [math]\int_{\underline{x}}^{\overline{x}}f\left(x\right)\delta\left(x-\alpha\right)dx=f\left(\alpha\right)1\left(\underline{x}\leq a\leq\overline{x}\right).[/math]

Example

Let

[math]Y=\begin{cases} 1, & \text{w.p. }\alpha\\ U\left(0,1\right) & \text{w.p. }1-\alpha \end{cases}[/math]

The distribution of [math]Y[/math] is neither continuous nor discrete. It has a mass point at 1; otherwise, it is a uniform distribution on the [math]\left[0,1\right][/math] support.

The “pdf” of [math]Y[/math] can be written as

[math]f_{Y}=\alpha\delta\left(y-1\right)+\left(1-\alpha\right)1\left(y\in\left[0,1\right]\right)[/math]

The expectation of [math]Y[/math] can be calculated as:

[math]\begin{aligned} E\left(Y\right) & =\int_{-\infty}^{\infty}y\left(\alpha\delta\left(y-1\right)+\left(1-\alpha\right)1\left(y\in\left[0,1\right]\right)\right)dy\\ & =\alpha\int_{-\infty}^{\infty}y\delta\left(y-1\right)dy+\left(1-\alpha\right)\int_{0}^{1}ydy\\ & =\alpha.1+\left(1-\alpha\right)\left.\frac{y^{2}}{2}\right|_{0}^{1}\\ & =\alpha+\frac{1-\alpha}{2}\\ & =\frac{1+\alpha}{2}\end{aligned}[/math]

The result makes sense: when [math]\alpha[/math] approaches 1, [math]Y[/math] converges to a mass point on [math]1[/math], and [math]E\left(Y\right)=1[/math]. When [math]\alpha[/math] approaches zero, [math]Y[/math] converges to a standard uniform distribution, and [math]E\left(Y\right)=\frac{1}{2}[/math].