# Asymptotic Properties of OLS

We now allow,

• $X$ to be random variables
• $\varepsilon$ to not necessarily be normally distributed.

In this case, we will need additional assumptions to be able to produce $\widehat{\beta}$:

• $\left\{ y_{i},x_{i}\right\}$ is a random sample.
• Strict Exogeneity: $E\left(\left.\varepsilon_{i}\right|X\right)=0,\,i=1..N$.
• Homoskedasticity: $E\left(\left.\varepsilon_{i}^{2}\right|X\right)=\sigma^{2},\,i=1..N$ and $E\left(\left.\varepsilon_{i}\varepsilon_{j}\right|X\right)=0,\,\forall i,j=1..N,i\neq j.$

## Implications of Strict Exogeneity

First, notice that if $E\left(\left.\varepsilon_{i}\right|X\right)=0$, then $E\left(\varepsilon\right)=0$:

$E\left(\left.\varepsilon_{i}\right|X\right)=0\Rightarrow E\left(E\left(\left.\varepsilon_{i}\right|X\right)\right)=E\left(0\right)\Leftrightarrow E\left(\varepsilon_{i}\right)=0.$

In other words, if the conditional expectation of $\varepsilon_{i}$ given any $X$ is zero, then the expectation of $\varepsilon_{i}$ also needs to be zero. This assumption implies that $\varepsilon_{i}$ is uncorrelated with $x_{i1}$, $x_{i2}$, ..., and also $x_{1k}$, $x_{2k}$, etc.

Second, the strict exogeneity assumption implies the orthogonality condition $E\left(x_{jk}\varepsilon_{i}\right)=0,\,\forall\,j,k$. (i.e., no matter how you pick $x$’s by selecting $j$ and $k$, the result is uncorrelated with $\varepsilon_{i}$).

To see this, let $E\left(x_{j}\varepsilon_{i}\right)=\left[\begin{array}{c} E\left(x_{j1}\varepsilon_{i}\right)\\ E\left(x_{j2}\varepsilon_{i}\right)\\ \vdots\\ E\left(x_{jK}\varepsilon_{i}\right) \end{array}\right]$

Then, it follows that

$E\left(x_{jk}\varepsilon_{i}\right)=E\left[E\left(\left.x_{jk}\varepsilon_{i}\right|x_{jk}\right)\right]=E\left[x_{jk}\underset{=0}{\underbrace{E\left(\left.\varepsilon_{i}\right|x_{jk}\right)}}\right]=0.$

## Asymptotic Distribution

First, notice that

• $X^{'}X=\sum_{i=1}^{n}x_{i}x_{i}^{'}$.
• $X^{'}\varepsilon=\sum_{i=1}^{n}x_{i}\varepsilon_{i}$.

It is possible to prove that under the assumptions above,

$\sqrt{N}\left(\widehat{\beta}_{OLS}-\beta\right)\overset{\sim}{\sim}N\left(0,Q^{-1}\sigma^{2}\right)$

where $Q=\text{plim}\,\frac{X^{'}X}{N}$.

This is relatively intuitive given our previous example. Yet, it is extremely useful: As long as we satisfy the assumptions laid out before, we can conduct hypothesis tests for OLS even if the distribution of $\varepsilon$ is unknown (up to some moments).

## Proof

Note that

\begin{aligned} \sqrt{N}\left(\widehat{\beta}-\beta\right) & =\sqrt{N}\left(\beta+\left(X^{'}X\right)^{-1}X^{'}\varepsilon-\beta\right)\\ & =\sqrt{N}\left(X^{'}X\right)^{-1}X^{'}\varepsilon\frac{N}{N}\\ & =\left(\frac{X^{'}X}{N}\right)^{-1}\frac{1}{\sqrt{N}}X^{'}\varepsilon\end{aligned}

While we will not show this here, we assume that $\frac{X^{'}X}{N}\overset{p}{\rightarrow}Q$, where $Q$ is a matrix. Notice that it is not implausible that $Q$ is a well-defined matrix: as $N$ increases, the size of $X^{'}X$ remains $\left(K\times K\right)$.

By a matrix version of Slutsky’s theorem, it follows that

$\left(\frac{X^{'}X}{N}\right)^{-1}\overset{p}{\rightarrow}Q^{-1}$

As for the second factor, because of term $\varepsilon$, it is likely that it converges in distribution. Let

$\frac{1}{\sqrt{N}}X^{'}\varepsilon=\sqrt{N}\frac{1}{N}\sum x_{i}\varepsilon_{i}=\sqrt{N}\overline{w}$

where $w_{i}=x_{i}\varepsilon_{i}$. Then,

$E\left(\overline{w}\right)=E\left(\frac{1}{N}\sum x_{i}\varepsilon_{i}\right)=0$

\begin{aligned} Var\left(\overline{w}\right) & =\frac{1}{N^{2}}Var\left(\sum x_{i}\varepsilon_{i}\right)=\frac{1}{N^{2}}\sum_{i=1}^{N}E\left(x_{i}E\left[\left.\varepsilon_{i}\varepsilon_{i}^{'}\right|x_{i}\right]x_{i}^{'}\right)\\ & =\frac{1}{N^{2}}\sigma^{2}\sum_{i=1}^{N}E\left(x_{i}x_{i}^{'}\right)\\ & =\frac{\sigma^{2}}{N}E\left(\frac{X^{'}X}{N}\right)\\ & =\frac{\sigma^{2}}{N}Q.\end{aligned}

By the CLT,

$\sqrt{N}\left(\overline{w}-E\left(\overline{w}\right)\right)\overset{d}{\rightarrow}N\left(0,\sigma^{2}Q\right)$

and by Slutsky’s theorem,

\begin{aligned} \sqrt{N}\left(\widehat{\beta}-\beta\right) & =\underset{\overset{\overset{p}{\rightarrow}}{Q^{-1}}}{\underbrace{\left(\frac{X^{'}X}{N}\right)^{-1}}}\underset{\overset{\overset{d}{\rightarrow}}{N\left(0,\sigma^{2}Q\right)}}{\underbrace{\sqrt{N}\overline{w}}}\\ \overset{\sim}{\sim} & N\left(0,Q^{-1}\sigma^{2}QQ^{-1}\right)\\ = & N\left(0,Q^{-1}\sigma^{2}\right)\end{aligned}

## Some Remarks

In practice, we use

$\widehat{\sigma^{2}}_{unbiased}=\frac{1}{N-K}\sum_{i=1}^{N}\left(y_{i}-x_{i}^{'}\widehat{\beta}\right)^{2}$

or

$\widehat{\sigma^{2}}_{MLE}=\frac{1}{N}\sum_{i=1}^{N}\left(y_{i}-x_{i}^{'}\widehat{\beta}\right)^{2}$

and

$\widehat{Q^{-1}}=\left(\frac{X^{'}X}{N}\right)^{-1}$.

## A note on the variance of $\varepsilon_{i}$

In the proof above, we have assumed that $Var\left(\varepsilon_{i}\right)=\sigma^{2}$. However, it is possible that $Var\left(\varepsilon_{i}\right)=\sigma_{i}^{2}$. In this case, the step $\frac{1}{N^{2}}\sum_{i=1}^{N}E\left(x_{i}E\left[\left.\varepsilon_{i}\varepsilon_{i}^{'}\right|x_{i}\right]x_{i}^{'}\right)=\frac{1}{N^{2}}\sigma^{2}\sum_{i=1}^{N}E\left(x_{i}x_{i}^{'}\right)$

does not hold. Nonetheless, it is possible to show that

$\sqrt{N}\left(\widehat{\beta}-\beta\right)\overset{\sim}{\sim}N\left(0,\Omega\right)$

where

$\Omega=E\left(X^{'}X\right)^{-1}Var\left(X\varepsilon\right)E\left(X^{'}X\right)^{-1},$

and

$\widehat{\Omega}=\left(\frac{X^{'}X}{N}\right)^{-1}\left(\frac{1}{N}\sum x_{i}x_{i}^{'}\widehat{\varepsilon}_{i}^{2}\right)\left(\frac{X^{'}X}{N}\right)^{-1}.$

The estimator above is called the Huber-Eicker-White estimator (or a variation using 1 or 2 of these names).

An issue remains: How do we obtain the $\widehat{\varepsilon}_{i}^{2}$ that shows up in the formula above? Usually we need to estimate the beta parameters first, but how can we estimate these without the correct variance-covariance matrix?

It turns out that $\widehat{\beta}_{OLS}$ is consistent even if $\varepsilon_{i}$ is heteroskedastic: It suffices that $E\left[\left(X^{'}X\right)^{-1}X^{'}\varepsilon\right]=0$, which is guaranteed by strict exogeneity. So, one can use the OLS estimates to produce estimator $\widehat{\Omega}$, and then perform valid asymptotic hypothesis tests.