Full Lecture 6

From Significant Statistics
Jump to navigation Jump to search

Multiple Random Variables (cont.)

Marginal CDF, PMF/PDF

If [math]\left(X,Y\right)'[/math] is a bivariate random vector, then the cdf of [math]X[/math] (and of [math]Y[/math]) is called the marginal cdf of [math]X[/math] [math]\left(Y\right)[/math].

For example, the marginal cdf of [math]X[/math] can be obtained via:

[math]F_{x}\left(x\right)=\lim_{g\rightarrow\infty}F_{X,Y}\left(x,y\right),\,\forall x\in\mathbb{R}[/math].

Notice that knowledge of [math]F_{X,Y}\left(x,y\right)[/math] implies knowledge of the marginal distributions. The converse is only true if [math]X[/math] and [math]Y[/math] are independent.

We can also obtain the marginal pmf/pdf in the following way.

  • If [math]\left(X,Y\right)'[/math] is discrete, then

[math]f_{X}\left(x\right)=\sum_{y\in\mathbb{R}}f_{X,Y}\left(x,y\right),\,x\in\mathbb{R}[/math].

  • If [math]\left(X,Y\right)'[/math] is continuous, then

[math]f_{X}\left(x\right)=\int_{-\infty}^{\infty}f_{X,Y}\left(x,y\right)dy,\,x\in\mathbb{R}[/math].

Independence

Two random variables [math]X[/math] and [math]Y[/math] are independent if [math]F_{X,Y}\left(x,y\right)=F_{X}\left(x\right)F_{Y}\left(y\right),\forall\left(x,y\right)'\in\mathbb{R}^{2}.[/math] Equivalently, two random variables [math]X[/math] and [math]Y[/math] are independent if [math]f_{X,Y}\left(x,y\right)=f_{X}\left(x\right)f_{Y}\left(y\right),\forall\left(x,y\right)'\in\mathbb{R}^{2}.[/math]


Conditional PMF/PDF

The conditional pmf/pdf of [math]X[/math] given [math]Y=y[/math], [math]f_{\left.X\right|Y}\left(x,y\right)[/math], is given by

[math]f_{\left.X\right|Y}\left(x,y\right)=\frac{f_{X,Y}\left(x,y\right)}{f_{Y}\left(y\right)}[/math]

If [math]f_{Y}\left(y\right)\gt 0[/math], then all of the properties of pmfs/pdfs apply to the the conditional pmf/pdf.

This conditional function is a pmf/pdf in its own right. It adds up to one, and is positive (bounded at 1 in the case of the pmf).

The interpretation is intuitive. If [math]\left(X,Y\right)'[/math] is discrete and [math]f_{Y}\left(y\right)\gt 0[/math], then [math]f_{\left.X\right|Y}\left(x,y\right)=\frac{f_{X,Y}\left(x,y\right)}{f_{Y}\left(y\right)}=\frac{P\left(X=x,Y=y\right)}{P\left(Y=y\right)}=P\left(\left.X=x\right|Y=y\right)[/math] i.e., [math]f_{\left.X\right|Y}\left(x,y\right)[/math] is the conditional probability of [math]X=x[/math] given that [math]Y=y[/math].

Because the value of a pdf does not correspond to a probability, the interpretation is trickier in the continuous case. If we would like to describe the density of [math]X[/math] when [math]Y=5[/math], for example, then [math]f_{\left.X\right|Y}\left(x,5\right)[/math] give us the intended object. However, a problem may arise. How can we ask about the pdf [math]f_{\left.X\right|Y}\left(x,5\right)[/math], for example, given that [math]P\left(Y=5\right)=0[/math] when [math]Y[/math] is continuous? A related issue is that one can obtain different conditional pdfs [math]f_{\left.X\right|Y}\left(x,y\right)[/math], depending on the method used in calculation. In the next section we briefly mention the Borel paradox.

Conditioning on Sets

We can also condition on an interval over [math]y[/math]. For example,

[math]f_{\left.X\right|Y}\left(x,y\in\left[\underline{y},\overline{y}\right]\right)[/math]=[math]\frac{\int_{\underline{y}}^{\overline{y}}f_{X,Y}\left(x,y\right)dy}{P\left(y\in\left[\underline{y},\overline{y}\right]\right)}=\frac{\int_{\underline{y}}^{\overline{y}}f_{X,Y}\left(x,y\right)dy}{\int_{-\infty}^{\infty}\int_{\underline{y}}^{\overline{y}}f_{X,Y}\left(x,y\right)dydx}[/math].

Borel Paradox

A resolution for conditioning on a measure zero event in the continuous case, is to always condition on subsets of [math]Y[/math] that have strictly positive probability. Then, one can take the limit as those sets converge to [math]y[/math].

For example, let [math]\overline{y}_{n}^{1}\gt y\gt \underline{y}_{n}^{1}[/math] and [math]\overline{y}_{n}^{2}\gt y\gt \underline{y}_{n}^{2}[/math] be different sequences of upper and lower bounds, such that [math]\lim_{n\rightarrow\infty}\overline{y}_{n}^{1}=\lim_{n\rightarrow\infty}\underline{y}_{n}^{1}=\lim_{n\rightarrow\infty}\overline{y}_{n}^{2}=\lim_{n\rightarrow\infty}\underline{y}_{n}^{2}=y[/math] and [math]P\left(y\in\left[\underline{y}_{n}^{1},\overline{y}_{n}^{1}\right]\right)\gt 0\,\forall n\in\mathbb{N}[/math] and [math]P\left(y\in\left[\underline{y}_{n}^{2},\overline{y}_{n}^{2}\right]\right)\gt 0\,\forall n\in\mathbb{N}[/math].

Then, one can calculate [math]\lim_{y_{n}\rightarrow y}f_{\left.X\right|Y}\left(x,y\in\left[\underline{y}_{n}^{1},\overline{y}_{n}^{1}\right]\right)[/math] and [math]\lim_{y_{n}\rightarrow y}f_{\left.X\right|Y}\left(x,y\in\left[\underline{y}_{n}^{2},\overline{y}_{n}^{2}\right]\right)[/math]. While the results may be different, each limit yields a single result.

For more information on the Borel paradox, see Proschan and Presnell (1998), this Wikipedia page, and this blog post for an intuitive explanation.


Some Conditional Moments

  • The conditional mean of [math]X[/math] given [math]Y[/math] is [math]E_{\left.X\right|Y}\left(\left.X\right|Y\right)[/math].

Implicitly, we mean that [math]Y=y[/math], but here we omit the specific value [math]y[/math].

  • The conditional variance of [math]X[/math] given [math]Y[/math] is

[math]Var_{\left.X\right|Y}\left(\left.X\right|Y\right)=E\left[\left.\left(X-E_{\left.X\right|Y}\left(\left.X\right|Y\right)\right)^{2}\right|Y\right]=E_{\left.X\right|Y}\left(\left.X^{2}\right|Y\right)-E_{\left.X\right|Y}\left(\left.X\right|Y\right)^{2}[/math].

  • The expected value of [math]X[/math] given [math]Y[/math] is defined as

[math]E_{\left.X\right|Y}\left(\left.X\right|Y=y\right)=\begin{cases} \sum_{s\in\mathbb{R}}s.f_{X|Y}\left(s,y\right), & \text{if}\,\left(X,Y\right)'\,\text{is discrete }\\ \int_{-\infty}^{\infty}s.f_{X|Y}\left(s,y\right)ds & \text{if}\,\left(X,Y\right)'\,\text{is continuous} \end{cases}[/math]


Law of Iterated Expectations

If [math]\left(X,Y\right)'[/math] is a random vector, then [math]E_{x}\left(X\right)=E_{Y}\left[E_{\left.X\right|Y}\left(\left.X\right|Y\right)\right][/math] provided the expectations of [math]X[/math] and [math]Y[/math] exist. The intuition is that, in order to calculate the expectation of [math]X[/math], we can first calculate the expectations of [math]X[/math] at each value of [math]Y[/math], and then average across those.

A Remark on Notation

  • Sometimes you will see notation [math]E\left(\left.X\right|Y\right)[/math] instead of [math]E_{\left.X\right|Y}\left(\left.X\right|Y\right)[/math]. The meaning is the same, but the former is more economical. Also, you may also see [math]E_{x}\left(X\right)[/math], where the subscript denotes the variable being integrated. However, in some cases, you will see an expression of the sort [math]E_{x}\left(g\left(X,Y\right)\right)[/math], where the subscript means that [math]X[/math] is fixed! So, make sure you are aware of the specific notation being used.

Proof of the Law of Iterated Expectations

Here we prove the law of iterated expectations for the continuous case:

[math]\begin{aligned} E\left(X\right) & =\int_{-\infty}^{\infty}\int_{-\infty}^{\infty}s.f_{X,Y}\left(s,t\right)dtds\\ & =\int_{-\infty}^{\infty}\int_{-\infty}^{\infty}s.\underset{=f_{X,Y}\left(s,t\right)}{\underbrace{f_{X|Y}\left(s,t\right)f_{Y}\left(t\right)}}dtds\\ & =\int_{-\infty}^{\infty}\underset{E_{\left.X\right|Y}\left(\left.X\right|Y\right)}{\underbrace{\int_{-\infty}^{\infty}s.f_{X|Y}\left(s,t\right)ds}}f_{Y}\left(t\right)dt\\ & =E_{Y}\left[E_{\left.X\right|Y}\left(\left.X\right|Y\right)\right]\end{aligned}[/math]


Conditional Variance Identity

The CVI is useful identity (especially for later, when we discuss linear regression). It is the decomposition

[math]Var_{X}\left(X\right)=E_{Y}\left[Var_{X|Y}\left(X|Y\right)\right]+Var_{Y}\left[E_{X|Y}\left(X|Y\right)\right][/math].

The interpretation for this equality will be clear when we discuss linear regression.


Covariance

The covariance of [math]X[/math] and [math]Y[/math] is [math]Cov\left(X,Y\right)=\sigma_{XY}=E\left[\left(X-\mu_{X}\right)\left(Y-\mu_{Y}\right)\right][/math]

Some properties:

  • [math]Cov\left(X,Y\right)=...=E\left(XY\right)-E\left(X\right)E\left(Y\right)[/math].
  • [math]Cov\left(X,X\right)=Var\left(X\right)[/math].
  • [math]Cov\left(X,Y\right)=0[/math] if [math]X[/math] and [math]Y[/math] are independent.

Correlation

The correlation of [math]X[/math] and [math]Y[/math] is [math]Corr\left(X,Y\right)=\rho_{XY}=\frac{Cov\left(X,Y\right)}{\sigma_{X}\sigma_{Y}}=\frac{\sigma_{XY}}{\sigma_{X}\sigma_{Y}}[/math] i.e., the correlation is equal to the covariance, standardized by the product of the standard deviations of the variables.

Some properties:

  • If [math]X[/math] and [math]Y[/math] are independent, then [math]\rho_{XY}=0[/math] .
  • [math]\left|\rho_{XY}\right|\leq1[/math], by the Cauchy-Schwarz inequality (explained next)
  • [math]\left|\rho_{XY}\right|=1[/math] if [math]P\left(Y=aX\pm b\right)=1[/math] for some [math]a\neq0,b\in\mathbb{R}[/math].
  • [math]\rho_{XY}[/math] is a measure of linear dependence.


Cauchy-Schwarz Inequality

If [math]\left(X,Y\right)'[/math] is a bivariate random vector, then

[math]\left|E\left(XY\right)\right|\leq E\left(\left|XY\right|\right)\leq\sqrt{E\left(X^{2}\right)}\sqrt{E\left(Y^{2}\right)}[/math].

The inequality binds joint moments by their separate properties.

Generalization: Holder Inequality

If [math]\left(X,Y\right)'[/math] is a bivariate random vector, then

[math]\left|E\left(XY\right)\right|\leq E\left(\left|XY\right|\right)\leq^{p}\sqrt{E\left(\left|X\right|^{p}\right)}^{q}\sqrt{E\left(\left|Y\right|^{q}\right)},\,for\,p,q\gt 0,p^{-1}+q^{-1}=1.[/math]

Jensen’s Inequality

If [math]X[/math] is an r.v. and [math]g:\mathbb{R}\rightarrow\mathbb{R}[/math] is a convex function, then [math]E\left[g\left(X\right)\right]\geq g\left[E\left(X\right)\right][/math].

This also implies that if [math]g\left(\cdot\right)[/math] is concave, then [math]E\left[g\left(X\right)\right]\leq g\left[E\left(X\right)\right][/math].

Finally, if [math]g\left(\cdot\right)[/math] is linear, then [math]E\left[g\left(X\right)\right]=g\left[E\left(X\right)\right][/math], since a linear function is both convex and concave.

For an example, consider an r.v. [math]X[/math] that can equal [math]0[/math] or [math]8[/math] with equal probability.

In this case, [math]E\left(X\right)^{2}=\left(0.5\times0+0.5\times8\right)^{2}=16[/math] and [math]E\left(X^{2}\right)=\left(0.5\times0^{2}+0.5\times8^{2}\right)=32[/math].

See the graphical depiction below:

image