# Multiple Random Variables (cont.)

## Marginal CDF, PMF/PDF

If $\left(X,Y\right)'$ is a bivariate random vector, then the cdf of $X$ (and of $Y$) is called the marginal cdf of $X$ $\left(Y\right)$.

For example, the marginal cdf of $X$ can be obtained via:

$F_{x}\left(x\right)=\lim_{g\rightarrow\infty}F_{X,Y}\left(x,y\right),\,\forall x\in\mathbb{R}$.

Notice that knowledge of $F_{X,Y}\left(x,y\right)$ implies knowledge of the marginal distributions. The converse is only true if $X$ and $Y$ are independent.

We can also obtain the marginal pmf/pdf in the following way.

• If $\left(X,Y\right)'$ is discrete, then

$f_{X}\left(x\right)=\sum_{y\in\mathbb{R}}f_{X,Y}\left(x,y\right),\,x\in\mathbb{R}$.

• If $\left(X,Y\right)'$ is continuous, then

$f_{X}\left(x\right)=\int_{-\infty}^{\infty}f_{X,Y}\left(x,y\right)dy,\,x\in\mathbb{R}$.

## Independence

Two random variables $X$ and $Y$ are independent if $F_{X,Y}\left(x,y\right)=F_{X}\left(x\right)F_{Y}\left(y\right),\forall\left(x,y\right)'\in\mathbb{R}^{2}.$ Equivalently, two random variables $X$ and $Y$ are independent if $f_{X,Y}\left(x,y\right)=f_{X}\left(x\right)f_{Y}\left(y\right),\forall\left(x,y\right)'\in\mathbb{R}^{2}.$

# Conditional PMF/PDF

The conditional pmf/pdf of $X$ given $Y=y$, $f_{\left.X\right|Y}\left(x,y\right)$, is given by

$f_{\left.X\right|Y}\left(x,y\right)=\frac{f_{X,Y}\left(x,y\right)}{f_{Y}\left(y\right)}$

If $f_{Y}\left(y\right)\gt 0$, then all of the properties of pmfs/pdfs apply to the the conditional pmf/pdf.

This conditional function is a pmf/pdf in its own right. It adds up to one, and is positive (bounded at 1 in the case of the pmf).

The interpretation is intuitive. If $\left(X,Y\right)'$ is discrete and $f_{Y}\left(y\right)\gt 0$, then $f_{\left.X\right|Y}\left(x,y\right)=\frac{f_{X,Y}\left(x,y\right)}{f_{Y}\left(y\right)}=\frac{P\left(X=x,Y=y\right)}{P\left(Y=y\right)}=P\left(\left.X=x\right|Y=y\right)$ i.e., $f_{\left.X\right|Y}\left(x,y\right)$ is the conditional probability of $X=x$ given that $Y=y$.

Because the value of a pdf does not correspond to a probability, the interpretation is trickier in the continuous case. If we would like to describe the density of $X$ when $Y=5$, for example, then $f_{\left.X\right|Y}\left(x,5\right)$ give us the intended object. However, a problem may arise. How can we ask about the pdf $f_{\left.X\right|Y}\left(x,5\right)$, for example, given that $P\left(Y=5\right)=0$ when $Y$ is continuous? A related issue is that one can obtain different conditional pdfs $f_{\left.X\right|Y}\left(x,y\right)$, depending on the method used in calculation. In the next section we briefly mention the Borel paradox.

# Conditioning on Sets

We can also condition on an interval over $y$. For example,

$f_{\left.X\right|Y}\left(x,y\in\left[\underline{y},\overline{y}\right]\right)$=$\frac{\int_{\underline{y}}^{\overline{y}}f_{X,Y}\left(x,y\right)dy}{P\left(y\in\left[\underline{y},\overline{y}\right]\right)}=\frac{\int_{\underline{y}}^{\overline{y}}f_{X,Y}\left(x,y\right)dy}{\int_{-\infty}^{\infty}\int_{\underline{y}}^{\overline{y}}f_{X,Y}\left(x,y\right)dydx}$.

A resolution for conditioning on a measure zero event in the continuous case, is to always condition on subsets of $Y$ that have strictly positive probability. Then, one can take the limit as those sets converge to $y$.

For example, let $\overline{y}_{n}^{1}\gt y\gt \underline{y}_{n}^{1}$ and $\overline{y}_{n}^{2}\gt y\gt \underline{y}_{n}^{2}$ be different sequences of upper and lower bounds, such that $\lim_{n\rightarrow\infty}\overline{y}_{n}^{1}=\lim_{n\rightarrow\infty}\underline{y}_{n}^{1}=\lim_{n\rightarrow\infty}\overline{y}_{n}^{2}=\lim_{n\rightarrow\infty}\underline{y}_{n}^{2}=y$ and $P\left(y\in\left[\underline{y}_{n}^{1},\overline{y}_{n}^{1}\right]\right)\gt 0\,\forall n\in\mathbb{N}$ and $P\left(y\in\left[\underline{y}_{n}^{2},\overline{y}_{n}^{2}\right]\right)\gt 0\,\forall n\in\mathbb{N}$.

Then, one can calculate $\lim_{y_{n}\rightarrow y}f_{\left.X\right|Y}\left(x,y\in\left[\underline{y}_{n}^{1},\overline{y}_{n}^{1}\right]\right)$ and $\lim_{y_{n}\rightarrow y}f_{\left.X\right|Y}\left(x,y\in\left[\underline{y}_{n}^{2},\overline{y}_{n}^{2}\right]\right)$. While the results may be different, each limit yields a single result.

# Some Conditional Moments

• The conditional mean of $X$ given $Y$ is $E_{\left.X\right|Y}\left(\left.X\right|Y\right)$.

Implicitly, we mean that $Y=y$, but here we omit the specific value $y$.

• The conditional variance of $X$ given $Y$ is

$Var_{\left.X\right|Y}\left(\left.X\right|Y\right)=E\left[\left.\left(X-E_{\left.X\right|Y}\left(\left.X\right|Y\right)\right)^{2}\right|Y\right]=E_{\left.X\right|Y}\left(\left.X^{2}\right|Y\right)-E_{\left.X\right|Y}\left(\left.X\right|Y\right)^{2}$.

• The expected value of $X$ given $Y$ is defined as

$E_{\left.X\right|Y}\left(\left.X\right|Y=y\right)=\begin{cases} \sum_{s\in\mathbb{R}}s.f_{X|Y}\left(s,y\right), & \text{if}\,\left(X,Y\right)'\,\text{is discrete }\\ \int_{-\infty}^{\infty}s.f_{X|Y}\left(s,y\right)ds & \text{if}\,\left(X,Y\right)'\,\text{is continuous} \end{cases}$

# Law of Iterated Expectations

If $\left(X,Y\right)'$ is a random vector, then $E_{x}\left(X\right)=E_{Y}\left[E_{\left.X\right|Y}\left(\left.X\right|Y\right)\right]$ provided the expectations of $X$ and $Y$ exist. The intuition is that, in order to calculate the expectation of $X$, we can first calculate the expectations of $X$ at each value of $Y$, and then average across those.

## A Remark on Notation

• Sometimes you will see notation $E\left(\left.X\right|Y\right)$ instead of $E_{\left.X\right|Y}\left(\left.X\right|Y\right)$. The meaning is the same, but the former is more economical. Also, you may also see $E_{x}\left(X\right)$, where the subscript denotes the variable being integrated. However, in some cases, you will see an expression of the sort $E_{x}\left(g\left(X,Y\right)\right)$, where the subscript means that $X$ is fixed! So, make sure you are aware of the specific notation being used.

## Proof of the Law of Iterated Expectations

Here we prove the law of iterated expectations for the continuous case:

\begin{aligned} E\left(X\right) & =\int_{-\infty}^{\infty}\int_{-\infty}^{\infty}s.f_{X,Y}\left(s,t\right)dtds\\ & =\int_{-\infty}^{\infty}\int_{-\infty}^{\infty}s.\underset{=f_{X,Y}\left(s,t\right)}{\underbrace{f_{X|Y}\left(s,t\right)f_{Y}\left(t\right)}}dtds\\ & =\int_{-\infty}^{\infty}\underset{E_{\left.X\right|Y}\left(\left.X\right|Y\right)}{\underbrace{\int_{-\infty}^{\infty}s.f_{X|Y}\left(s,t\right)ds}}f_{Y}\left(t\right)dt\\ & =E_{Y}\left[E_{\left.X\right|Y}\left(\left.X\right|Y\right)\right]\end{aligned}

# Conditional Variance Identity

The CVI is useful identity (especially for later, when we discuss linear regression). It is the decomposition

$Var_{X}\left(X\right)=E_{Y}\left[Var_{X|Y}\left(X|Y\right)\right]+Var_{Y}\left[E_{X|Y}\left(X|Y\right)\right]$.

The interpretation for this equality will be clear when we discuss linear regression.

# Covariance

The covariance of $X$ and $Y$ is $Cov\left(X,Y\right)=\sigma_{XY}=E\left[\left(X-\mu_{X}\right)\left(Y-\mu_{Y}\right)\right]$

Some properties:

• $Cov\left(X,Y\right)=...=E\left(XY\right)-E\left(X\right)E\left(Y\right)$.
• $Cov\left(X,X\right)=Var\left(X\right)$.
• $Cov\left(X,Y\right)=0$ if $X$ and $Y$ are independent.

# Correlation

The correlation of $X$ and $Y$ is $Corr\left(X,Y\right)=\rho_{XY}=\frac{Cov\left(X,Y\right)}{\sigma_{X}\sigma_{Y}}=\frac{\sigma_{XY}}{\sigma_{X}\sigma_{Y}}$ i.e., the correlation is equal to the covariance, standardized by the product of the standard deviations of the variables.

Some properties:

• If $X$ and $Y$ are independent, then $\rho_{XY}=0$ .
• $\left|\rho_{XY}\right|\leq1$, by the Cauchy-Schwarz inequality (explained next)
• $\left|\rho_{XY}\right|=1$ if $P\left(Y=aX\pm b\right)=1$ for some $a\neq0,b\in\mathbb{R}$.
• $\rho_{XY}$ is a measure of linear dependence.

# Cauchy-Schwarz Inequality

If $\left(X,Y\right)'$ is a bivariate random vector, then

$\left|E\left(XY\right)\right|\leq E\left(\left|XY\right|\right)\leq\sqrt{E\left(X^{2}\right)}\sqrt{E\left(Y^{2}\right)}$.

The inequality binds joint moments by their separate properties.

# Generalization: Holder Inequality

If $\left(X,Y\right)'$ is a bivariate random vector, then

$\left|E\left(XY\right)\right|\leq E\left(\left|XY\right|\right)\leq^{p}\sqrt{E\left(\left|X\right|^{p}\right)}^{q}\sqrt{E\left(\left|Y\right|^{q}\right)},\,for\,p,q\gt 0,p^{-1}+q^{-1}=1.$

# Jensen’s Inequality

If $X$ is an r.v. and $g:\mathbb{R}\rightarrow\mathbb{R}$ is a convex function, then $E\left[g\left(X\right)\right]\geq g\left[E\left(X\right)\right]$.

This also implies that if $g\left(\cdot\right)$ is concave, then $E\left[g\left(X\right)\right]\leq g\left[E\left(X\right)\right]$.

Finally, if $g\left(\cdot\right)$ is linear, then $E\left[g\left(X\right)\right]=g\left[E\left(X\right)\right]$, since a linear function is both convex and concave.

For an example, consider an r.v. $X$ that can equal $0$ or $8$ with equal probability.

In this case, $E\left(X\right)^{2}=\left(0.5\times0+0.5\times8\right)^{2}=16$ and $E\left(X^{2}\right)=\left(0.5\times0^{2}+0.5\times8^{2}\right)=32$.

See the graphical depiction below: