Full Lecture 3

From Significant Statistics
Jump to navigation Jump to search

Expected Value

The expected value of r.v. [math]X[/math], usually written as [math]E\left(X\right)[/math], is defined as

[math]E\left(X\right)=\sum_{x\in\mathbf{R}}x\,f_{X}\left(x\right)[/math] if [math]X[/math] is discrete.

[math]E\left(X\right)=\int_{-\infty}^{+\infty}t\,f_{X}\left(t\right)dt[/math] if [math]X[/math] is continuous.

In general, suppose we would like to calculate [math]E\left(g\left(X\right)\right)[/math] where [math]g\left(\cdot\right)[/math] is a function. Then we would obtain

[math]E\left(g\left(X\right)\right)=\sum_{x\in\mathbf{R}}g\left(x\right)\,f_{X}\left(x\right)[/math] if [math]X[/math] is discrete.

[math]E\left(g\left(X\right)\right)=\int_{-\infty}^{+\infty}g\left(t\right)\,f_{X}\left(t\right)dt[/math] if [math]X[/math] is continuous.

Existence of the Expected Value

Unlike in the discrete case, it is possible to obtain [math]E\left(X\right)=\infty[/math] in the continuous case. This is surprising if we think of expectations as averages, because no average of finite numbers turns out to be infinite. However, this is not assured in the case where [math]X[/math] is continuous. To see this, notice that we already know that some integrals yield infinity rather than a real number.

For example, [math]\int_{1}^{\infty}\frac{1}{x}dx=\infty[/math]. The reason is that, while [math]\frac{1}{x}[/math] is decreasing when [math]x\gt 1[/math], it approximates the x-axis ‘too slowly’, such that the area underneath grows fast enough so its sum is infinite.

Suppose instead that r.v. [math]X[/math] has pdf [math]\frac{1}{x^{2}}[/math], defined in domain [math]\left[1,\infty\right][/math]. We can show that this function is indeed a pdf, since [math]\int_{1}^{\infty}\frac{1}{x^{2}}dx=1[/math], and the function is positive over its domain. In this case, [math]E\left(X\right)=\int_{1}^{\infty}x\frac{1}{x^{2}}dx=\int_{1}^{\infty}\frac{1}{x}dx=\infty[/math]; We have discovered a r.v. that does not have an expected value (i.e., it’s infinite). Intuitively, the expected value exists as long as the pdf approaches zero fast enough.

You will usually read the statement “if [math]\int_{-\infty}^{+\infty}\left|t\right|\,f_{X}\left(t\right)dt=\infty[/math], then [math]E\left(X\right)[/math] does not exist”, and may wonder about the absolute value. This is just a compact way to write the non-existence of the expected value. To see this, first suppose [math]X[/math] were always positive. In this case, the absolute value would be redundant, but the statement would remain correct. If [math]X[/math] were always negative and with [math]E\left[X\right]=-\infty[/math], then the statement would still apply, because [math]E\left[\left|X\right|\right]=\infty[/math] remains true. Finally, suppose [math]X[/math] spans [math]\left(-\infty,\infty\right)[/math] and [math]\int_{-\infty}^{0}\left|t\right|\,f_{X}\left(t\right)dt=-\infty[/math] and [math]\int_{0}^{\infty}\left|t\right|\,f_{X}\left(t\right)dt=+\infty[/math]. Then, the expectation of [math]X[/math] does not exist, as it is indeterminate, but the statement [math]\int_{-\infty}^{+\infty}\left|t\right|\,f_{X}\left(t\right)dt=\infty[/math] still holds.

So, [math]\int_{-\infty}^{+\infty}\left|t\right|\,f_{X}\left(t\right)dt=\infty[/math] is an efficient way to summarize the cases that may lead a r.v. to not have an expectation.

Alternative notation

You may sometimes see the statement [math]E\left(X\right)=\int_{-\infty}^{+\infty}t dF_{X}\left(t\right)[/math] instead. This notation usually refers to the Lebesgue integral, where [math]F_{X}\left(t\right)[/math] refers to a 'measure.' We do not cover the distinction here here. Notice that if we are ok canceling differentials (we'll avoid the long technicalities), we can obtain [math]\int_{-\infty}^{+\infty}\left|t\right|dF_{X}\left(t\right)=\int_{-\infty}^{+\infty}\left|t\right|dF_{X}\frac{dt}{dt}=\int_{-\infty}^{+\infty}\left|t\right|\frac{dF_{X}\left(t\right)}{dt}dt[/math], and finally by noting that [math]\frac{dF_{X}\left(t\right)}{dt}=f_{X}\left(t\right)[/math], we obtain the familiar expression [math]\int_{-\infty}^{+\infty}t\,f_{X}\left(t\right)dt[/math].

Basic properties of expectations

  • Linearity: [math]E\left(ag\left(X\right)+bh\left(X\right)\right)=aE\left(g\left(X\right)\right)+bE\left(h\left(X\right)\right)[/math]
  • Order-preserving: [math]g\left(X\right)\leq h\left(X\right),\,\forall x\in\mathbb{R}\Rightarrow E\left(g\left(X\right)\right)\leq E\left(h\left(X\right)\right)[/math] (and equality holds if [math]g\left(X\right)=h\left(X\right)[/math])

Moments of a Random Variable

We have already discussed the expectation of an r.v. [math]X[/math]. It is useful to also consider expectations of specific transformations of [math]X[/math], called moments:

  • The [math]k^{\mbox{th}}[/math] moment of [math]X[/math] is [math]E\left(X^{k}\right)=\mu_{k}^{'}[/math], [math]k\in\mathbb{N}[/math]
  • The expected value of [math]X[/math] can be written as [math]\mu[/math], and also as [math]\mu_{1}^{'}[/math] .
  • The [math]k^{\mbox{th}}[/math] central/centered moment of [math]X[/math] is [math]E\left(\left(X-\mu\right)^{k}\right)=\mu_{k}[/math], [math]k\in\mathbb{N}[/math]

Moments and Functions of Moments

  • The variance of [math]X[/math] is [math]\mu_{2}=\sigma^{2}=E\left(\left(X-\mu\right)^{2}\right)[/math]
  • The standard deviation of [math]X[/math] is [math]\sigma=\sqrt{\sigma^{2}}=\sqrt{E\left(\left(X-\mu\right)^{2}\right)}[/math]
  • The skewness of [math]X[/math] is [math]\alpha_{3}=E\left(\left(\frac{X-\mu}{\sigma}\right)^{3}\right)=\frac{E\left(\left(X-\mu\right)^{3}\right)}{Var\left(X\right)^{\frac{3}{2}}}[/math]
  • The kurtosis of [math]X[/math] is [math]\alpha_{4}=E\left(\left(\frac{X-\mu}{\sigma}\right)^{4}\right)=\frac{E\left(\left(X-\mu\right)^{4}\right)}{Var\left(X\right)^{2}}[/math]

Both skewness and kurtosis characterize the shape of the distribution. Both denominators are always positive, and so act as normalizers.

In terms of skewness, as [math]\mu[/math] increases, [math]\alpha_{3}[/math] decreases, such that distributions with higher mean (usually lop-sided the right side) show negative skewness.

As for kurtosis, both the numerator and denominator capture variability, but the numerator weights outliers more. When [math]\alpha_{4}=3[/math], we say the distribution is mesokurtic: its tails represent the same relative mass as the center, and the tails decay at the same rate, as the normal distribution. When [math]\alpha_{4}\lt 3[/math], the distribution is leptokurtic (most of the mass is away from the tails), and when [math]\alpha_{4}\gt 3[/math], we say the distribution is platykurtic (the distribution appears flatter than the Normal).

Standard Normal Distribution

Suppose [math]X\sim N\left(0,1\right)[/math], so [math]X[/math] is continuous with pdf [math]f_{X}\left(x\right)=\frac{1}{\sqrt{2\pi}}\exp\left(-\frac{x^{2}}{2}\right)[/math].

Useful fact: In the case of the normal, [math]xf_{X}\left(x\right)=-\frac{d}{dx}f_{X}\left(x\right)[/math].





where integration by parts yields


Cauchy Distribution

In this case, [math]f_{X}=\frac{1}{\pi}\frac{1}{1+x^{2}}[/math], and [math]\int_{-\infty}^{\infty}\left|t\right|f_{X}\left(t\right)dt=\infty[/math] (i.e., its first moment does not exist).

Also, notice that because [math]\int_{-\infty}^{\infty}\left|t\right|^{k}f_{X}\left(t\right)dt\geq\int_{-\infty}^{\infty}\left|t\right|f_{X}\left(t\right)dt[/math] for [math]k\gt 1[/math], then the nonexistence of a moment implies nonexistence of higher moments as well.

Some Useful Identities

  • [math]Var\left(X\right)=E\left(X^{2}\right)-E\left(X\right)^{2}[/math].
  • [math]E\left(aX+b\right)=aE\left(X\right)+b[/math].
  • [math]Var\left(aX+b\right)=a^{2}Var\left(X\right)[/math].

Normal Distribution

An r.v. [math]X[/math] is normally distributed with mean [math]\mu[/math] and variance [math]\sigma^{2}[/math], denoted as [math]X\sim N\left(\mu,\sigma^{2}\right)[/math], if [math]X[/math] is continuous with pdf [math]f_{X}\left(x\right)=\frac{1}{\sqrt{2\pi\sigma^{2}}}\exp\left(-\frac{\left(x-\mu\right)^{2}}{2\sigma^{2}}\right),x\in\mathbb{R}[/math].

Here’s a helpful fact: If [math]Z\sim N\left(0,1\right)[/math], then [math]X=\mu+\sigma Z\sim N\left(\mu,\sigma^{2}\right)[/math].

From this, it follows that:

  • [math]E\left(X\right)=E\left(\mu+\sigma Z\right)=\mu+\sigma\underset{=0}{\underbrace{E\left(z\right)}}=\mu.[/math]
  • [math]Var\left(X\right)=Var\left(\mu+\sigma Z\right)=\sigma^{2}\underset{=1}{\underbrace{Var\left(Z\right)}}=\sigma^{2}.[/math]

Moment Generating Function

The moment generating function (m.g.f.) is a function that can be used to calculate moments of a r.v. in a different way than the one we defined before. We’ll talk about its use and intuition after we define it.

The m.g.f. of a r.v. [math]X[/math] is the function [math]M_{X}:\mathbb{R}\rightarrow\mathbb{R}_{+}[/math] given by


First, notice that the m.g.f. is a function of [math]t[/math], not [math]X[/math]. [math]X[/math] is “integrated out” by the expectation operator. (After we calculate the integral over [math]X[/math], it disappears.)

How does the m.g.f. operate? To see this, first, open up the expectation to obtain


Now, we can take advantage of the fact that the derivative of [math]\exp\left(tx\right)[/math] w.r.t. [math]t[/math] is [math]x\exp\left(tx\right)[/math]. Taking the first derivative of [math]M_{X}\left(t\right)[/math] w.r.t. [math]t[/math] yields [math]\frac{d}{dt}M_{X}\left(t\right)=\frac{d}{dt}\int_{-\infty}^{\infty}\exp\left(tx\right)f_{X}\left(x\right)dx=\int_{-\infty}^{\infty}\frac{\partial}{\partial t}\exp\left(tx\right)f_{X}\left(x\right)dx=\int_{-\infty}^{\infty}x\exp\left(tx\right)f_{X}\left(x\right)dx[/math].

Granted, it’s not clear how this step may have helped. But evaluate this expression at [math]t=0[/math], and we obtain

[math]\int_{-\infty}^{\infty}x\underset{=1}{\underbrace{\exp\left(0x\right)}}f_{X}\left(x\right)dx=\int_{-\infty}^{\infty}xf_{X}\left(x\right)dx[/math], which is the formula for [math]E\left(X\right)[/math].

Suppose now that, rather than one derivative, we took [math]k[/math]. In this case, we will obtain the equality

[math]\frac{d}{dt^{k}}M_{X}\left(t\right)=\int_{-\infty}^{\infty}x^{k}\exp\left(tx\right)f_{X}\left(x\right)dx[/math]. If we evaluate this function at [math]t=0[/math], we obtain the important equality


We have established the main result of the m.g.f.: by taking [math]k[/math] derivatives, and then evaluating it at zero, we obtain the "k-th" moment of [math]X[/math]. One of the uses of the m.g.f. (which we will use later), is that if two random variables have the same m.g.f. (i.e., [math]M_{X}\left(t\right)=M_{Y}\left(t\right))[/math], then they have the same distribution, i.e., [math]F_{X}\left(x\right)=F_{Y}\left(x\right)[/math]. Certain conditions are necessary. For starters, notice that we have already assumed that we can differente under the integral sign.

Facts about the MGF

  • The m.g.f. only applies if the moments of the r.v. exists.
  • A r.v. may have moments, and yet the m.g.f. may yield infinity (notice that we’re taking an integral of an exponential, and so the integral will diverge if the pdf does not approximate zero fast enough). The typical example for this is the log-normal distribution.
  • Our proof relied on being able to move the derivative inside of the integral. We have covered the conditions for doing so in the previous lecture.
  • If there exists a neighborhood near [math]t=0[/math] where the m.g.f. is finite, i.e. if [math]\exists h\gt 0:M_{X}\left(t\right)\lt \infty,t\in\left(-h,h\right)[/math], then it is possible to show that the m.g.f. uniquely identifies the c.d.f. of [math]X[/math]. This means that if [math]M_{X}\left(t\right)=M_{Y}\left(t\right)[/math] and this condition is met, then [math]F_{X}\left(x\right)=F_{Y}\left(x\right),\forall x\in\mathbb{R}[/math]. In other words, the m.g.f. can be used to identify the distribution of a random variable uniquely, provided it is finite in a neighborhood of zero. The proof is not trivial.
  • If the m.g.f. is finite in a neighborhood of zero, then all of the moments of the distribution exist.
  • When the m.g.f. does not exist around a neighborhood of zero, we can always use the characteristic function to characterize a distribution. This function is given by [math]C_{X}:\mathbb{R}\rightarrow\mathbb{C}[/math], where [math]C_{X}\left(t\right)=E\left(\exp\left(itX\right)\right)=E\left(\text{cos}\left(tX\right)+i.sin\left(tX\right)\right)[/math].

Standard Normal Distribution

Let's calculate the mgf of the standard normal distribution. When

[math]X\sim N\left(0,1\right)[/math], then [math]M_{X}\left(t\right)=\int_{-\infty}^{\infty}\exp\left(tx\right)\frac{1}{\sqrt{2\pi}}\exp\left(-\frac{x^{2}}{2}\right)dx[/math].

We can rearrange the integrand:


The first equality follows from completing the square, and the second equality uses the fact that the integral is w.r.t. [math]x[/math], not [math]t[/math]. Note also that the integrand of the last expression is the pdf of the [math]Normal(t,1)[/math] distribution, and so its integral equals 1. Hence, we obtain [math]M_{X}\left(t\right)=\exp\left(\frac{t^{2}}{2}\right)[/math]. We can test that it produces the mean and variance of [math]N\left(0,1\right)[/math].

MGF of Affine Transformations

A useful fact about the m.g.f.: If [math]X[/math] has m.g.f. [math]M_{X}\left(t\right)[/math], then [math]M_{aX+b}\left(t\right)=\exp\left(bt\right)M_{X}\left(at\right)[/math]. This is easily proved by expanding [math]E\left(\exp\left(aX+b\right)t\right)[/math]. This implies the following for the general normal distribution:

Suppose [math]X\sim N\left(\mu,\sigma^{2}\right)[/math], such that [math]X=\mu+\sigma Z[/math] where [math]Z\sim N\left(0,1\right).[/math]


[math]M_{X}\left(t\right)=M_{\mu+\sigma Z}\left(t\right)=\exp\left(\mu t\right)M_{Z}\left(\sigma t\right)=\exp\left(\mu t\right)\exp\left(\frac{1}{2}\sigma^{2}t^{2}\right)=\exp\left(\mu t+\frac{1}{2}\sigma^{2}t^{2}\right).[/math]

We can now use this m.g.f. to calculate the moments of the general normal distribution [math]N\left(\mu,\sigma^{2}\right).[/math]