Lecture 17. C) Asymptotic Properties of OLS
Contents
Asymptotic Properties of OLS
We now allow,
- [math]X[/math] to be random variables
- [math]\varepsilon[/math] to not necessarily be normally distributed.
In this case, we will need additional assumptions to be able to produce [math]\widehat{\beta}[/math]:
- [math]\left\{ y_{i},x_{i}\right\}[/math] is a random sample.
- Strict Exogeneity: [math]E\left(\left.\varepsilon_{i}\right|X\right)=0,\,i=1..N[/math].
- Homoskedasticity: [math]E\left(\left.\varepsilon_{i}^{2}\right|X\right)=\sigma^{2},\,i=1..N[/math] and [math]E\left(\left.\varepsilon_{i}\varepsilon_{j}\right|X\right)=0,\,\forall i,j=1..N,i\neq j.[/math]
Implications of Strict Exogeneity
First, notice that if [math]E\left(\left.\varepsilon_{i}\right|X\right)=0[/math], then [math]E\left(\varepsilon\right)=0[/math]:
[math]E\left(\left.\varepsilon_{i}\right|X\right)=0\Rightarrow E\left(E\left(\left.\varepsilon_{i}\right|X\right)\right)=E\left(0\right)\Leftrightarrow E\left(\varepsilon_{i}\right)=0.[/math]
In other words, if the conditional expectation of [math]\varepsilon_{i}[/math] given any [math]X[/math] is zero, then the expectation of [math]\varepsilon_{i}[/math] also needs to be zero. This assumption implies that [math]\varepsilon_{i}[/math] is uncorrelated with [math]x_{i1}[/math], [math]x_{i2}[/math], ..., and also [math]x_{1k}[/math], [math]x_{2k}[/math], etc.
Second, the strict exogeneity assumption implies the orthogonality condition [math]E\left(x_{jk}\varepsilon_{i}\right)=0,\,\forall\,j,k[/math]. (i.e., no matter how you pick [math]x[/math]’s by selecting [math]j[/math] and [math]k[/math], the result is uncorrelated with [math]\varepsilon_{i}[/math]).
To see this, let [math]E\left(x_{j}\varepsilon_{i}\right)=\left[\begin{array}{c} E\left(x_{j1}\varepsilon_{i}\right)\\ E\left(x_{j2}\varepsilon_{i}\right)\\ \vdots\\ E\left(x_{jK}\varepsilon_{i}\right) \end{array}\right][/math]
Then, it follows that
[math]E\left(x_{jk}\varepsilon_{i}\right)=E\left[E\left(\left.x_{jk}\varepsilon_{i}\right|x_{jk}\right)\right]=E\left[x_{jk}\underset{=0}{\underbrace{E\left(\left.\varepsilon_{i}\right|x_{jk}\right)}}\right]=0.[/math]
Asymptotic Distribution
First, notice that
- [math]X^{'}X=\sum_{i=1}^{n}x_{i}x_{i}^{'}[/math].
- [math]X^{'}\varepsilon=\sum_{i=1}^{n}x_{i}\varepsilon_{i}[/math].
It is possible to prove that under the assumptions above,
[math]\sqrt{N}\left(\widehat{\beta}_{OLS}-\beta\right)\overset{\sim}{\sim}N\left(0,Q^{-1}\sigma^{2}\right)[/math]
where [math]Q=\text{plim}\,\frac{X^{'}X}{N}[/math].
This is relatively intuitive given our previous example. Yet, it is extremely useful: As long as we satisfy the assumptions laid out before, we can conduct hypothesis tests for OLS even if the distribution of [math]\varepsilon[/math] is unknown (up to some moments).
Proof
Note that
[math]\begin{aligned} \sqrt{N}\left(\widehat{\beta}-\beta\right) & =\sqrt{N}\left(\beta+\left(X^{'}X\right)^{-1}X^{'}\varepsilon-\beta\right)\\ & =\sqrt{N}\left(X^{'}X\right)^{-1}X^{'}\varepsilon\frac{N}{N}\\ & =\left(\frac{X^{'}X}{N}\right)^{-1}\frac{1}{\sqrt{N}}X^{'}\varepsilon\end{aligned}[/math]
While we will not show this here, we assume that [math]\frac{X^{'}X}{N}\overset{p}{\rightarrow}Q[/math], where [math]Q[/math] is a matrix. Notice that it is not implausible that [math]Q[/math] is a well-defined matrix: as [math]N[/math] increases, the size of [math]X^{'}X[/math] remains [math]\left(K\times K\right)[/math].
By a matrix version of Slutsky’s theorem, it follows that
[math]\left(\frac{X^{'}X}{N}\right)^{-1}\overset{p}{\rightarrow}Q^{-1}[/math]
As for the second factor, because of term [math]\varepsilon[/math], it is likely that it converges in distribution. Let
[math]\frac{1}{\sqrt{N}}X^{'}\varepsilon=\sqrt{N}\frac{1}{N}\sum x_{i}\varepsilon_{i}=\sqrt{N}\overline{w}[/math]
where [math]w_{i}=x_{i}\varepsilon_{i}[/math]. Then,
[math]E\left(\overline{w}\right)=E\left(\frac{1}{N}\sum x_{i}\varepsilon_{i}\right)=0[/math]
[math]\begin{aligned} Var\left(\overline{w}\right) & =\frac{1}{N^{2}}Var\left(\sum x_{i}\varepsilon_{i}\right)=\frac{1}{N^{2}}\sum_{i=1}^{N}E\left(x_{i}E\left[\left.\varepsilon_{i}\varepsilon_{i}^{'}\right|x_{i}\right]x_{i}^{'}\right)\\ & =\frac{1}{N^{2}}\sigma^{2}\sum_{i=1}^{N}E\left(x_{i}x_{i}^{'}\right)\\ & =\frac{\sigma^{2}}{N}E\left(\frac{X^{'}X}{N}\right)\\ & =\frac{\sigma^{2}}{N}Q.\end{aligned}[/math]
By the CLT,
[math]\sqrt{N}\left(\overline{w}-E\left(\overline{w}\right)\right)\overset{d}{\rightarrow}N\left(0,\sigma^{2}Q\right)[/math]
and by Slutsky’s theorem,
[math]\begin{aligned} \sqrt{N}\left(\widehat{\beta}-\beta\right) & =\underset{\overset{\overset{p}{\rightarrow}}{Q^{-1}}}{\underbrace{\left(\frac{X^{'}X}{N}\right)^{-1}}}\underset{\overset{\overset{d}{\rightarrow}}{N\left(0,\sigma^{2}Q\right)}}{\underbrace{\sqrt{N}\overline{w}}}\\ \overset{\sim}{\sim} & N\left(0,Q^{-1}\sigma^{2}QQ^{-1}\right)\\ = & N\left(0,Q^{-1}\sigma^{2}\right)\end{aligned}[/math]
Some Remarks
In practice, we use
[math]\widehat{\sigma^{2}}_{unbiased}=\frac{1}{N-K}\sum_{i=1}^{N}\left(y_{i}-x_{i}^{'}\widehat{\beta}\right)^{2}[/math]
or
[math]\widehat{\sigma^{2}}_{MLE}=\frac{1}{N}\sum_{i=1}^{N}\left(y_{i}-x_{i}^{'}\widehat{\beta}\right)^{2}[/math]
and
[math]\widehat{Q^{-1}}=\left(\frac{X^{'}X}{N}\right)^{-1}[/math].
A note on the variance of [math]\varepsilon_{i}[/math]
In the proof above, we have assumed that [math]Var\left(\varepsilon_{i}\right)=\sigma^{2}[/math]. However, it is possible that [math]Var\left(\varepsilon_{i}\right)=\sigma_{i}^{2}[/math]. In this case, the step [math]\frac{1}{N^{2}}\sum_{i=1}^{N}E\left(x_{i}E\left[\left.\varepsilon_{i}\varepsilon_{i}^{'}\right|x_{i}\right]x_{i}^{'}\right)=\frac{1}{N^{2}}\sigma^{2}\sum_{i=1}^{N}E\left(x_{i}x_{i}^{'}\right)[/math]
does not hold. Nonetheless, it is possible to show that
[math]\sqrt{N}\left(\widehat{\beta}-\beta\right)\overset{\sim}{\sim}N\left(0,\Omega\right)[/math]
where
[math]\Omega=E\left(X^{'}X\right)^{-1}Var\left(X\varepsilon\right)E\left(X^{'}X\right)^{-1},[/math]
and
[math]\widehat{\Omega}=\left(\frac{X^{'}X}{N}\right)^{-1}\left(\frac{1}{N}\sum x_{i}x_{i}^{'}\widehat{\varepsilon}_{i}^{2}\right)\left(\frac{X^{'}X}{N}\right)^{-1}.[/math]
The estimator above is called the Huber-Eicker-White estimator (or a variation using 1 or 2 of these names).
An issue remains: How do we obtain the [math]\widehat{\varepsilon}_{i}^{2}[/math] that shows up in the formula above? Usually we need to estimate the beta parameters first, but how can we estimate these without the correct variance-covariance matrix?
It turns out that [math]\widehat{\beta}_{OLS}[/math] is consistent even if [math]\varepsilon_{i}[/math] is heteroskedastic: It suffices that [math]E\left[\left(X^{'}X\right)^{-1}X^{'}\varepsilon\right]=0[/math], which is guaranteed by strict exogeneity. So, one can use the OLS estimates to produce estimator [math]\widehat{\Omega}[/math], and then perform valid asymptotic hypothesis tests.