Asymptotic Properties of OLS

We now allow,

[math]X[/math] to be random variables
[math]\varepsilon[/math] to not necessarily be normally distributed.

In this case, we will need additional assumptions to be able to produce [math]\widehat{\beta}[/math]:

[math]\left\{ y_{i},x_{i}\right\}[/math] is a random sample.
Strict Exogeneity: [math]E\left(\left.\varepsilon_{i}\right|X\right)=0,\,i=1..N[/math].
Homoskedasticity: [math]E\left(\left.\varepsilon_{i}^{2}\right|X\right)=\sigma^{2},\,i=1..N[/math] and [math]E\left(\left.\varepsilon_{i}\varepsilon_{j}\right|X\right)=0,\,\forall i,j=1..N,i\neq j.[/math]

Implications of Strict Exogeneity

First, notice that if [math]E\left(\left.\varepsilon_{i}\right|X\right)=0[/math], then [math]E\left(\varepsilon\right)=0[/math]:

[math]E\left(\left.\varepsilon_{i}\right|X\right)=0\Rightarrow E\left(E\left(\left.\varepsilon_{i}\right|X\right)\right)=E\left(0\right)\Leftrightarrow E\left(\varepsilon_{i}\right)=0.[/math]

In other words, if the conditional expectation of [math]\varepsilon_{i}[/math] given any [math]X[/math] is zero, then the expectation of [math]\varepsilon_{i}[/math] also needs to be zero. This assumption implies that [math]\varepsilon_{i}[/math] is uncorrelated with [math]x_{i1}[/math], [math]x_{i2}[/math], ..., and also [math]x_{1k}[/math], [math]x_{2k}[/math], etc.

Second, the strict exogeneity assumption implies the orthogonality condition [math]E\left(x_{jk}\varepsilon_{i}\right)=0,\,\forall\,j,k[/math]. (i.e., no matter how you pick [math]x[/math]’s by selecting [math]j[/math] and [math]k[/math], the result is uncorrelated with [math]\varepsilon_{i}[/math]).

To see this, let [math]E\left(x_{j}\varepsilon_{i}\right)=\left[\begin{array}{c} E\left(x_{j1}\varepsilon_{i}\right)\\ E\left(x_{j2}\varepsilon_{i}\right)\\ \vdots\\ E\left(x_{jK}\varepsilon_{i}\right) \end{array}\right][/math]

Then, it follows that

[math]E\left(x_{jk}\varepsilon_{i}\right)=E\left[E\left(\left.x_{jk}\varepsilon_{i}\right|x_{jk}\right)\right]=E\left[x_{jk}\underset{=0}{\underbrace{E\left(\left.\varepsilon_{i}\right|x_{jk}\right)}}\right]=0.[/math]

Asymptotic Distribution

First, notice that

[math]X^{'}X=\sum_{i=1}^{n}x_{i}x_{i}^{'}[/math].
[math]X^{'}\varepsilon=\sum_{i=1}^{n}x_{i}\varepsilon_{i}[/math].

It is possible to prove that under the assumptions above,

[math]\sqrt{N}\left(\widehat{\beta}_{OLS}-\beta\right)\overset{\sim}{\sim}N\left(0,Q^{-1}\sigma^{2}\right)[/math]

where [math]Q=\text{plim}\,\frac{X^{'}X}{N}[/math].

This is relatively intuitive given our previous example. Yet, it is extremely useful: As long as we satisfy the assumptions laid out before, we can conduct hypothesis tests for OLS even if the distribution of [math]\varepsilon[/math] is unknown (up to some moments).

Proof

Note that

[math]\begin{aligned} \sqrt{N}\left(\widehat{\beta}-\beta\right) & =\sqrt{N}\left(\beta+\left(X^{'}X\right)^{-1}X^{'}\varepsilon-\beta\right)\\ & =\sqrt{N}\left(X^{'}X\right)^{-1}X^{'}\varepsilon\frac{N}{N}\\ & =\left(\frac{X^{'}X}{N}\right)^{-1}\frac{1}{\sqrt{N}}X^{'}\varepsilon\end{aligned}[/math]

While we will not show this here, we assume that [math]\frac{X^{'}X}{N}\overset{p}{\rightarrow}Q[/math], where [math]Q[/math] is a matrix. Notice that it is not implausible that [math]Q[/math] is a well-defined matrix: as [math]N[/math] increases, the size of [math]X^{'}X[/math] remains [math]\left(K\times K\right)[/math].

By a matrix version of Slutsky’s theorem, it follows that

[math]\left(\frac{X^{'}X}{N}\right)^{-1}\overset{p}{\rightarrow}Q^{-1}[/math]

As for the second factor, because of term [math]\varepsilon[/math], it is likely that it converges in distribution. Let

[math]\frac{1}{\sqrt{N}}X^{'}\varepsilon=\sqrt{N}\frac{1}{N}\sum x_{i}\varepsilon_{i}=\sqrt{N}\overline{w}[/math]

where [math]w_{i}=x_{i}\varepsilon_{i}[/math]. Then,

[math]E\left(\overline{w}\right)=E\left(\frac{1}{N}\sum x_{i}\varepsilon_{i}\right)=0[/math]

[math]\begin{aligned} Var\left(\overline{w}\right) & =\frac{1}{N^{2}}Var\left(\sum x_{i}\varepsilon_{i}\right)=\frac{1}{N^{2}}\sum_{i=1}^{N}E\left(x_{i}E\left[\left.\varepsilon_{i}\varepsilon_{i}^{'}\right|x_{i}\right]x_{i}^{'}\right)\\ & =\frac{1}{N^{2}}\sigma^{2}\sum_{i=1}^{N}E\left(x_{i}x_{i}^{'}\right)\\ & =\frac{\sigma^{2}}{N}E\left(\frac{X^{'}X}{N}\right)\\ & =\frac{\sigma^{2}}{N}Q.\end{aligned}[/math]

By the CLT,

[math]\sqrt{N}\left(\overline{w}-E\left(\overline{w}\right)\right)\overset{d}{\rightarrow}N\left(0,\sigma^{2}Q\right)[/math]

and by Slutsky’s theorem,

[math]\begin{aligned} \sqrt{N}\left(\widehat{\beta}-\beta\right) & =\underset{\overset{\overset{p}{\rightarrow}}{Q^{-1}}}{\underbrace{\left(\frac{X^{'}X}{N}\right)^{-1}}}\underset{\overset{\overset{d}{\rightarrow}}{N\left(0,\sigma^{2}Q\right)}}{\underbrace{\sqrt{N}\overline{w}}}\\ \overset{\sim}{\sim} & N\left(0,Q^{-1}\sigma^{2}QQ^{-1}\right)\\ = & N\left(0,Q^{-1}\sigma^{2}\right)\end{aligned}[/math]

Some Remarks

In practice, we use

[math]\widehat{\sigma^{2}}_{unbiased}=\frac{1}{N-K}\sum_{i=1}^{N}\left(y_{i}-x_{i}^{'}\widehat{\beta}\right)^{2}[/math]

or

[math]\widehat{\sigma^{2}}_{MLE}=\frac{1}{N}\sum_{i=1}^{N}\left(y_{i}-x_{i}^{'}\widehat{\beta}\right)^{2}[/math]

and

[math]\widehat{Q^{-1}}=\left(\frac{X^{'}X}{N}\right)^{-1}[/math].

A note on the variance of [math]\varepsilon_{i}[/math]

In the proof above, we have assumed that [math]Var\left(\varepsilon_{i}\right)=\sigma^{2}[/math]. However, it is possible that [math]Var\left(\varepsilon_{i}\right)=\sigma_{i}^{2}[/math]. In this case, the step [math]\frac{1}{N^{2}}\sum_{i=1}^{N}E\left(x_{i}E\left[\left.\varepsilon_{i}\varepsilon_{i}^{'}\right|x_{i}\right]x_{i}^{'}\right)=\frac{1}{N^{2}}\sigma^{2}\sum_{i=1}^{N}E\left(x_{i}x_{i}^{'}\right)[/math]

does not hold. Nonetheless, it is possible to show that

[math]\sqrt{N}\left(\widehat{\beta}-\beta\right)\overset{\sim}{\sim}N\left(0,\Omega\right)[/math]

where

[math]\Omega=E\left(X^{'}X\right)^{-1}Var\left(X\varepsilon\right)E\left(X^{'}X\right)^{-1},[/math]

and

[math]\widehat{\Omega}=\left(\frac{X^{'}X}{N}\right)^{-1}\left(\frac{1}{N}\sum x_{i}x_{i}^{'}\widehat{\varepsilon}_{i}^{2}\right)\left(\frac{X^{'}X}{N}\right)^{-1}.[/math]

The estimator above is called the Huber-Eicker-White estimator (or a variation using 1 or 2 of these names).

An issue remains: How do we obtain the [math]\widehat{\varepsilon}_{i}^{2}[/math] that shows up in the formula above? Usually we need to estimate the beta parameters first, but how can we estimate these without the correct variance-covariance matrix?

It turns out that [math]\widehat{\beta}_{OLS}[/math] is consistent even if [math]\varepsilon_{i}[/math] is heteroskedastic: It suffices that [math]E\left[\left(X^{'}X\right)^{-1}X^{'}\varepsilon\right]=0[/math], which is guaranteed by strict exogeneity. So, one can use the OLS estimates to produce estimator [math]\widehat{\Omega}[/math], and then perform valid asymptotic hypothesis tests.

<< Normal Linear Model

Bootstrapping >>

Lecture 17. C) Asymptotic Properties of OLS

Contents

Asymptotic Properties of OLS

Implications of Strict Exogeneity

Asymptotic Distribution

Proof

Some Remarks

A note on the variance of [math]\varepsilon_{i}[/math]

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Tools