Gauss-Markov Theorem

The Gauss Markov theorem is an important result for the OLS estimator. It does not depend on asymptotics or normality assumptions. It states that, in the linear regression model - which does include the homoskedasticity assumption - $\widehat{\beta}_{OLS}$ is the minimum variance linear unbiased estimator (BLUE) of $\beta$.

The proof is not hard. We consider the case where $X$ is fixed (i.e., all expressions are conditioned on $X$; $\varepsilon$ is the random variable).

Proof

Let

\begin{aligned} \widehat{\beta}_{OLS} & =\left(X^{'}X\right)^{-1}X^{'}y\\ \widetilde{\beta} & =Cy\end{aligned}

where $\widetilde{\beta}$ is some alternative linear estimator, with some matrix $C$ with dimensions $\left(N\times K\right)$.

For $\widetilde{\beta}$ to be unbiased, we require that

\begin{aligned} & E_{\beta}\left(\widetilde{\beta}\right)=\beta\\ \Leftrightarrow & E_{\beta}\left(Cy\right)=\beta\\ \Leftrightarrow & E_{\beta}\left(C\left(X\beta+\varepsilon\right)\right)=\beta\\ \Leftrightarrow & CX\beta=\beta\end{aligned}

because $E_{\beta}\left(\varepsilon\right)=0$. Notice that for the equation above to hold, we require $CX=I$.

Let us now calculate the variance of $\widetilde{\beta}$:

$Var_{\beta}\left(\widetilde{\beta}\right)=Var_{\beta}\left(C\varepsilon\right)=CC^{'}\sigma^{2}$

Now, define $D$ as the difference between the “slopes” of $\widetilde{\beta}$ and $\widehat{\beta}_{OLS}$, s.t. $D=C-\left(X^{'}X\right)^{-1}X^{'}$. Using this definition, we can rewrite $Var_{\beta}\left(\widetilde{\beta}\right)$ as

\begin{aligned} Var_{\beta}\left(\widetilde{\beta}\right) & =CC^{'}\sigma^{2}\\ & =\left(D+\left(X^{'}X\right)^{-1}X^{'}\right)\left(D+\left(X^{'}X\right)^{-1}X^{'}\right)^{'}\sigma^{2}\\ & =DD^{'}\sigma^{2}+\underset{=0}{\underbrace{DX\left(X^{'}X\right)^{-1}\sigma^{2}+\left(X^{'}X\right)^{-1}X^{'}D^{'}\sigma^{2}}}+\underset{Var\left(\widehat{\beta}_{OLS}\right)}{\underbrace{\left(X^{'}X\right)^{-1}\sigma^{2}}}\end{aligned}

The last term equals the variance of the OLS estimator. The second and third terms equal zero each, because

$\left.\begin{array}{c} CX=I\\ CX=DX+I \end{array}\right\} \Rightarrow DX=0$

where the first equation is an implication of unbiasedeness, and the second one follows from postmultiplying the definition of $D$ by $X$.

Hence, we have learned that

$Var_{\beta}\left(\widetilde{\beta}\right)=Var\left(\widehat{\beta}_{OLS}\right)+DD^{'}\sigma^{2}.$

Because $DD^{'}$ is a positive semidefinite matrix, by definition, $Var_{\beta}\left(\widetilde{\beta}\right)\geq Var\left(\widehat{\beta}_{OLS}\right).$

Finally, note that we did not make assumptions about the distribution of $\varepsilon$. When $\varepsilon\sim N\left(0,\sigma^{2}\right)$, then $\widehat{\beta}_{OLS}$ attains the Cramer-Rao lower bound: In this case, $\widehat{\beta}_{OLS}$ is the best unbiased estimator (BUE), i.e., even non-linear estimators cannot be more efficient than OLS.