Full Lecture 2

From Significant Statistics
Jump to navigation Jump to search

Lecture 2

Correspondence Theorem

Let [math]P_{X}\left(\cdot\right)[/math] and [math]P_{Y}\left(\cdot\right)[/math] be probability functions, defined on [math]\mathcal{B}\left(\mathbf{R}\right)[/math] and let [math]F_{X}\left(\cdot\right)[/math] and [math]F_{Y}\left(\cdot\right)[/math] be associated cdfs. Then,

[math]P_{X}\left(\cdot\right)=P_{Y}\left(\cdot\right)[/math] iff [math]F_{X}\left(\cdot\right)=F_{Y}\left(\cdot\right)[/math].

The correspondence theorem assures us that we can restrict ourselves to cdfs. Relying on these won’t restrict us in any way, when compared to using probability functions.

CDFs

Function [math]F:\mathbf{R}\rightarrow\left[0,1\right][/math] is a cdf if it satisfies the following conditions:

  • [math]\lim_{x\rightarrow-\infty}F\left(x\right)=0[/math]
  • [math]\lim_{x\rightarrow+\infty}F\left(x\right)=1[/math]
  • [math]F\left(\cdot\right)[/math] is non-decreasing
  • [math]F\left(\cdot\right)[/math] is right-continuous (this can be shown by using probability functions of intervals)

Nature of RVs

We now define the nature of a random variable:

Random variable [math]X[/math] is discrete if [math]\exists f_{X}:\mathbf{R}\rightarrow\left[0,1\right][/math] s.t. [math]F_{X}\left(x\right)=\sum_{t\leq x}f_{X}\left(t\right),x\in\mathbf{R}[/math]

Function [math]f_{X}[/math] is called the probability mass function (pmf).

Random variable [math]X[/math] is continuous if [math]\exists f_{X}:\mathbf{R}\rightarrow\mathbf{R}_{+}[/math] s.t. [math]F_{X}\left(x\right)=\int_{-\infty}^{x}f_{X}\left(t\right)dt,x\in\mathbf{R}[/math]

Any such [math]f_{X}[/math] is called a probability density function (pdf). Notice that unlike pmfs, multiple pdfs are consistent with a given cdf. This occurs as long as the pdfs differ only on a set of (probability) measure-zero events.

Another interesting remark is that the probability of any specific value of a continuous variable is zero, i.e., [math]P\left(\left\{ x\right\} \right)=0,\forall x\in\mathbf{R}[/math].

Examples

Coin tossing

[math]F_{X}\left(x\right)=\begin{cases} 0, & x\lt 0\\ \frac{1}{2}, & 0\leq x\lt 1\\ 1, & x\geq1 \end{cases}[/math] In this case, [math]X[/math] is discrete and [math]F_{X}[/math] is a step function (this always occurs for discrete r.v.s). The probability mass function is equal to [math]f_{X}\left(x\right)=\begin{cases} \frac{1}{2}, & x\in\left\{ 0,1\right\} \\ 0, & otherwise \end{cases}[/math]

Uniform distribution on (0,1)

[math]F_{X}\left(x\right)=\begin{cases} 0, & x\lt 0\\ x, & 0\leq x\lt 1\\ 1, & x\geq1 \end{cases}[/math] where [math]X[/math] is continuous.

Moreover, both [math]f_{X}\left(x\right)=\begin{cases} 1, & x\in\left[0,1\right]\\ 0, & otherwise \end{cases}[/math] and [math]f_{X}\left(x\right)=\begin{cases} 1, & x\in\left(0,1\right)\\ 0, & otherwise \end{cases}[/math] are consistent pdfs.

Normal distribution

A r.v. [math]X[/math] has a standard normal distribution, [math]X\sim N\left(0,1\right)[/math], if it is continuous with pdf [math]f_{X}\left(x\right)=\frac{1}{\sqrt{2\pi}}e^{-\frac{x^{2}}{2}},x\in\mathbf{R}[/math]

PMFs and PDFs

Notice that pmfs and, in a sense pdfs, ‘add up’ to one. There is a theorem that states the result can apply in both directions. For the pmf,

[math]f:\mathbf{R}\rightarrow\left[0,1\right][/math] is the pmf of a discrete r.v. iff [math]\sum_{x\in\mathbf{R}}f\left(x\right)=1[/math]

And for the pdf,

[math]f:\mathbf{R}\rightarrow\mathbf{R}_{+}[/math] is the pdf of a continuous r.v. iff [math]\int_{-\infty}^{\infty}f\left(x\right)dx=1[/math]


It’s clear from the examples above, that one can specify the distribution of a random variable by specifying its distribution function, or its probability mass/density function. Sometimes, however, it is advantageous to specify the distribution of a random variable by a transformation. For example, suppose [math]Y[/math] is defined as a random variable that follows [math]X^{2}[/math], where [math]X\sim N\left(0,1\right)[/math]. This takes us to discussing transformations of random variables.



Leibniz Rule

This rule can be useful in a series of domains. It states that

[math]\frac{d}{dx}\int_{a\left(x\right)}^{b\left(x\right)}f\left(x,t\right)dt=\int_{a\left(x\right)}^{b\left(x\right)}\frac{\partial}{\partial x}f\left(x,t\right)dt+f\left(x,b\left(x\right)\right)b'\left(x\right)-f\left(x,a\left(x\right)\right)a'\left(x\right)[/math].

This means that the derivative of an integral can written as the integral of a derivative, plus functions of the integrands and of the integration limits. The case where [math]b\left(x\right)[/math] and [math]a\left(x\right)[/math] are constant follow immediately.

For this rule to apply, we require [math]f\left(x,t\right)[/math] and its partial derivative w.r.t. [math]x[/math] to be continuous, and that both limits of integration are continuously differentiable. We also note that this rule can be derived from the chain rule of differentiation (see this proof).

Improper Integrals

We require an additional condition when the limits of integration are infinity. At the crux of this problem is whether [math]\lim_{h\rightarrow0}\int_{0}^{\infty}\frac{f\left(x+h,t\right)-f\left(x,t\right)}{h}dt=\int_{0}^{\infty}\lim_{h\rightarrow0}\frac{f\left(x+h,t\right)-f\left(x,t\right)}{h}dt[/math]. In order to understand why Leibniz rule can fail in this case, consider the example [math]f\left(x,t\right)=\frac{\sin\left(tx\right)}{t}[/math], plotted below:

image

First, notice that by calculating the expression of interest, [math]\frac{d}{dx}\int_{0}^{\infty}f\left(x,t\right)dt[/math], we learn how the area under [math]f\left(x,t\right)[/math] along the [math]t[/math] axis changes when [math]x[/math] is moved slightly. In order to see this, consider the following plot of [math]\frac{\sin\left(tx\right)}{t}[/math], shown for specific values of [math]t[/math] and [math]x[/math].

image

For now, we will focus on the blue lines, along which [math]x[/math] is fixed. We can think of [math]\frac{d}{dx}\int_{0}^{\infty}f\left(x,t\right)dt[/math] as first calculating the area under each of the blue curves, and then calculating how those areas change as a function of [math]x[/math]. If we do this, we obtain [math]\frac{d}{dx}\int_{0}^{\infty}f\left(x,t\right)dt=\frac{d}{dx}\frac{\pi}{2}sign\left(x\right)[/math] which equals zero at [math]x\neq0[/math] and is infinite at [math]x=0[/math]. This is the correct answer: As [math]x[/math] changes slightly, the area under [math]f\left(x,t\right)[/math] remains constant, except at [math]x=0[/math], where it changes at an infinite rate.

Now, consider the alternative calculation, [math]\int_{0}^{\infty}\frac{\partial}{\partial x}f\left(x,t\right)dt[/math]. In this case, we first calculate how much the function changes with small increments in [math]x[/math] for generic values of [math]t[/math]. For example, we could be calculating the vertical differences in the endpoints of the orange lines of the plot above. Then, we add up these differences along [math]t[/math], by applying the integral.

The integrand of this expression is given by: [math]\frac{\partial}{\partial x}f\left(x,t\right)=\frac{\partial}{\partial x}\frac{\sin\left(tx\right)}{t}=\text{cos}\left(tx\right)[/math]. We have learned that the slope of [math]f[/math] along the [math]x[/math]-axis is periodic. Function [math]\text{cos}\left(tx\right)[/math] represents the information about the slopes, which we represent below through small line segments, along [math]x=5[/math]:

image

A property of the cosine (and other elementary trigonometric functions) is that, for a given [math]x[/math], the area 'underneath' is also periodic and does not vanish as we approximate infinity. This is a problem: When we take an integral from zero to infinity, the area under [math]\text{cos}\left(tx\right)[/math] does not converge.

Intuitively, the integral adds up the slopes in the [math]x[/math] direction, which keep rotating. If these slopes stabilized at some point (for example, if they all approximated zero when [math]t[/math] was large), then the integral would also converge. However, because the slopes keep ‘rotating’ as [math]t[/math] changes, the integral does not converge and Leibniz rule fails.



Transformations of random variables

Suppose [math]Y=g\left(X\right)[/math], where [math]g:\mathbf{R}\rightarrow\mathbf{R}[/math] is a function and [math]X[/math] is an r.v. with cdf [math]F_{X}[/math].

Clearly, [math]Y[/math] is also a random variable. Its induced probability function is equal to [math]P_{Y}\left(\cdot\right)=P_{X}\circ g^{-1}[/math]. When [math]X[/math] is discrete, it is usually simple to obtain the distribution of [math]Y[/math]. This becomes more complicated in the continuous case.

We consider the cases of strictly monotone transformations here. When transformations are not strictly monotone, the same procedure applies in a piecewise fashion (i.e., one needs to apply it repeatedly to different monotone sections of the transformation).

Affine Transformations: CDF

  • Suppose [math]Y=g\left(X\right)=aX+b,a\gt 0,b\in\mathbf{R}[/math].

In order to deduce [math]F_{Y}[/math], we use the probability functions of [math]X[/math] and [math]Y[/math]. Notice first that [math]F_{Y\left(y\right)}=P\left(Y\leq y\right)[/math]. The probability statement can be used to relate the cdf of [math]Y[/math] to the cdf of X.

Specifically, [math]P\left(Y\leq y\right)=P\left(aX+b\leq y\right)=P\left(X\leq\frac{y-b}{a}\right)=F_{X}\left(\frac{y-b}{a}\right)[/math].

This is a very useful result: we have related the cdf of a transformed r.v. [math]Y[/math] to the cdf of the transformed variable [math]X[/math]. We have learned that the distribution of [math]Y[/math] is given by the distribution of [math]X[/math], evaluated at a transformed value of the function's argument.

  • Now, suppose [math]Y=aX+b[/math] where [math]a\lt 0[/math]. In this case, we obtain

[math]F_{Y}\left(y\right)=P\left(Y\leq y\right)=P\left(aX+b\leq y\right)=P\left(X\geq\frac{y-b}{a}\right)=1-P\left(X\leq\frac{y-b}{a}\right)=1-F_{X}\left(\frac{y-b}{a}\right)[/math].

Affine Transformations: PDF

  • If [math]a\gt 0[/math], we have [math]Y=aX+b[/math]. We know [math]F_{Y}\left(y\right)=\int_{-\infty}^{\frac{y-b}{a}}f_{X}\left(t\right)dt[/math], and that [math]f_{Y}\left(y\right)=\frac{d}{dy}F_{Y}\left(y\right)[/math]. So, by applying Leibniz rule, we obtain

[math]f_{Y}\left(y\right)=\frac{d}{dy}F_{Y}\left(y\right)=f_{X}\left(\frac{y-b}{a}\right)\frac{d}{dy}\frac{y-b}{a}=f_{X}\left(\frac{y-b}{a}\right)\frac{1}{a}[/math].

  • If on the other hand [math]a\lt 0[/math], we would have [math]F_{Y}\left(y\right)=1-\int_{-\infty}^{\frac{y-b}{a}}f_{X}\left(t\right)dt[/math], and applying Leibniz rule yields [math]f_{Y}\left(y\right)=-f_{X}\left(\frac{y-b}{a}\right)\frac{1}{a}.[/math]

We can write down both of these cases simultaneously as

[math]f_{Y}\left(y\right)=f_{X}\left(\frac{y-b}{a}\right)\left|\frac{1}{a}\right|[/math], when [math]Y=aX+b[/math].

In general, as long as the transformation [math]Y=g\left(X\right)[/math] is monotonic, then [math]f_{Y}\left(y\right)=f_{X}\left(g^{-1}\left(y\right)\right)\left|\frac{d}{dy}g^{-1}\left(y\right)\right|[/math]. When it is not, then one can simply apply the formula separately for each monotonic region. Notice that the role of [math]g^{-1}\left(y\right)[/math] is to ensure that the result is expressed as a function of the argument of interest, [math]y[/math], rather than [math]x[/math].

There also exists a formula for transformations of multiple random variables. In this case, rather than a single derivative, one uses the absolute value of the determinant of the Jacobian matrix of the transformations.