Difference between revisions of "Lecture 18. C) Gauss-Markov Theorem"

From Significant Statistics
Jump to navigation Jump to search
(Proof)
 
(One intermediate revision by the same user not shown)
Line 21: Line 21:
  
 
<math display="block">\begin{aligned}
 
<math display="block">\begin{aligned}
  & E_{\beta}\left(\widetilde{\beta}\right)=\\
+
  & E_{\beta}\left(\widetilde{\beta}\right)=\beta\\
 
\Leftrightarrow & E_{\beta}\left(Cy\right)=\beta\\
 
\Leftrightarrow & E_{\beta}\left(Cy\right)=\beta\\
 
\Leftrightarrow & E_{\beta}\left(C\left(X\beta+\varepsilon\right)\right)=\beta\\
 
\Leftrightarrow & E_{\beta}\left(C\left(X\beta+\varepsilon\right)\right)=\beta\\
Line 32: Line 32:
 
<math display="block">Var_{\beta}\left(\widetilde{\beta}\right)=Var_{\beta}\left(C\varepsilon\right)=CC^{'}\sigma^{2}</math>
 
<math display="block">Var_{\beta}\left(\widetilde{\beta}\right)=Var_{\beta}\left(C\varepsilon\right)=CC^{'}\sigma^{2}</math>
  
Now, define <math display="inline">D</math> as the difference between the “slope” of <math display="inline">\widetilde{\beta}</math> and <math display="inline">\widehat{\beta}_{OLS}</math>, s.t. <math display="inline">D=C-\left(X^{'}X\right)^{-1}X^{'}</math>. Using this definition, we can rewrite <math display="inline">Var_{\beta}\left(\widetilde{\beta}\right)</math> as
+
Now, define <math display="inline">D</math> as the difference between the “slopes” of <math display="inline">\widetilde{\beta}</math> and <math display="inline">\widehat{\beta}_{OLS}</math>, s.t. <math display="inline">D=C-\left(X^{'}X\right)^{-1}X^{'}</math>. Using this definition, we can rewrite <math display="inline">Var_{\beta}\left(\widetilde{\beta}\right)</math> as
  
 
<math display="block">\begin{aligned}
 
<math display="block">\begin{aligned}
 
Var_{\beta}\left(\widetilde{\beta}\right) & =CC^{'}\sigma^{2}\\
 
Var_{\beta}\left(\widetilde{\beta}\right) & =CC^{'}\sigma^{2}\\
 
  & =\left(D+\left(X^{'}X\right)^{-1}X^{'}\right)\left(D+\left(X^{'}X\right)^{-1}X^{'}\right)^{'}\sigma^{2}\\
 
  & =\left(D+\left(X^{'}X\right)^{-1}X^{'}\right)\left(D+\left(X^{'}X\right)^{-1}X^{'}\right)^{'}\sigma^{2}\\
  & =DD^{'}\sigma^{2}+\underset{=0}{\underbrace{D\left(X^{'}X\right)^{-1}X^{'}\sigma^{2}+\left(X^{'}X\right)^{-1}X^{'}D\sigma^{2}}}+\underset{Var\left(\widehat{\beta}_{OLS}\right)}{\underbrace{\left(X^{'}X\right)^{-1}\sigma^{2}}}\end{aligned}</math>
+
  & =DD^{'}\sigma^{2}+\underset{=0}{\underbrace{DX\left(X^{'}X\right)^{-1}\sigma^{2}+\left(X^{'}X\right)^{-1}X^{'}D^{'}\sigma^{2}}}+\underset{Var\left(\widehat{\beta}_{OLS}\right)}{\underbrace{\left(X^{'}X\right)^{-1}\sigma^{2}}}\end{aligned}</math>
  
 
The last term equals the variance of the OLS estimator. The second and third terms equal zero each, because
 
The last term equals the variance of the OLS estimator. The second and third terms equal zero each, because
Line 54: Line 54:
 
Because <math display="inline">DD^{'}</math> is a positive semidefinite matrix, by definition, <math display="inline">Var_{\beta}\left(\widetilde{\beta}\right)\geq Var\left(\widehat{\beta}_{OLS}\right).</math>
 
Because <math display="inline">DD^{'}</math> is a positive semidefinite matrix, by definition, <math display="inline">Var_{\beta}\left(\widetilde{\beta}\right)\geq Var\left(\widehat{\beta}_{OLS}\right).</math>
  
Finally, note that we did not make assumptions about the distribution of <math display="inline">\varepsilon</math>. In the case <math display="inline">\varepsilon\sim N\left(0,\sigma^{2}\right)</math>, then <math display="inline">\widehat{\beta}_{OLS}</math> attains the Cramer-Rao lower bound: In this case, <math display="inline">\widehat{\beta}_{OLS}</math> is the best unbiased estimator (BUE), i.e., even non-linear estimators can be more efficient in this case.
+
Finally, note that we did not make assumptions about the distribution of <math display="inline">\varepsilon</math>. When <math display="inline">\varepsilon\sim N\left(0,\sigma^{2}\right)</math>, then <math display="inline">\widehat{\beta}_{OLS}</math> attains the Cramer-Rao lower bound: In this case, <math display="inline">\widehat{\beta}_{OLS}</math> is the best unbiased estimator (BUE), i.e., even non-linear estimators cannot be more efficient than OLS.
 
<section end=section1 />
 
<section end=section1 />
 
</br>
 
</br>

Latest revision as of 19:18, 4 December 2019


Gauss-Markov Theorem

The Gauss Markov theorem is an important result for the OLS estimator. It does not depend on asymptotics or normality assumptions. It states that, in the linear regression model - which does include the homoskedasticity assumption - [math]\widehat{\beta}_{OLS}[/math] is the minimum variance linear unbiased estimator (BLUE) of [math]\beta[/math].

The proof is not hard. We consider the case where [math]X[/math] is fixed (i.e., all expressions are conditioned on [math]X[/math]; [math]\varepsilon[/math] is the random variable).

Proof

Let

[math]\begin{aligned} \widehat{\beta}_{OLS} & =\left(X^{'}X\right)^{-1}X^{'}y\\ \widetilde{\beta} & =Cy\end{aligned}[/math]

where [math]\widetilde{\beta}[/math] is some alternative linear estimator, with some matrix [math]C[/math] with dimensions [math]\left(N\times K\right)[/math].

For [math]\widetilde{\beta}[/math] to be unbiased, we require that

[math]\begin{aligned} & E_{\beta}\left(\widetilde{\beta}\right)=\beta\\ \Leftrightarrow & E_{\beta}\left(Cy\right)=\beta\\ \Leftrightarrow & E_{\beta}\left(C\left(X\beta+\varepsilon\right)\right)=\beta\\ \Leftrightarrow & CX\beta=\beta\end{aligned}[/math]

because [math]E_{\beta}\left(\varepsilon\right)=0[/math]. Notice that for the equation above to hold, we require [math]CX=I[/math].

Let us now calculate the variance of [math]\widetilde{\beta}[/math]:

[math]Var_{\beta}\left(\widetilde{\beta}\right)=Var_{\beta}\left(C\varepsilon\right)=CC^{'}\sigma^{2}[/math]

Now, define [math]D[/math] as the difference between the “slopes” of [math]\widetilde{\beta}[/math] and [math]\widehat{\beta}_{OLS}[/math], s.t. [math]D=C-\left(X^{'}X\right)^{-1}X^{'}[/math]. Using this definition, we can rewrite [math]Var_{\beta}\left(\widetilde{\beta}\right)[/math] as

[math]\begin{aligned} Var_{\beta}\left(\widetilde{\beta}\right) & =CC^{'}\sigma^{2}\\ & =\left(D+\left(X^{'}X\right)^{-1}X^{'}\right)\left(D+\left(X^{'}X\right)^{-1}X^{'}\right)^{'}\sigma^{2}\\ & =DD^{'}\sigma^{2}+\underset{=0}{\underbrace{DX\left(X^{'}X\right)^{-1}\sigma^{2}+\left(X^{'}X\right)^{-1}X^{'}D^{'}\sigma^{2}}}+\underset{Var\left(\widehat{\beta}_{OLS}\right)}{\underbrace{\left(X^{'}X\right)^{-1}\sigma^{2}}}\end{aligned}[/math]

The last term equals the variance of the OLS estimator. The second and third terms equal zero each, because

[math]\left.\begin{array}{c} CX=I\\ CX=DX+I \end{array}\right\} \Rightarrow DX=0[/math]

where the first equation is an implication of unbiasedeness, and the second one follows from postmultiplying the definition of [math]D[/math] by [math]X[/math].

Hence, we have learned that

[math]Var_{\beta}\left(\widetilde{\beta}\right)=Var\left(\widehat{\beta}_{OLS}\right)+DD^{'}\sigma^{2}.[/math]

Because [math]DD^{'}[/math] is a positive semidefinite matrix, by definition, [math]Var_{\beta}\left(\widetilde{\beta}\right)\geq Var\left(\widehat{\beta}_{OLS}\right).[/math]

Finally, note that we did not make assumptions about the distribution of [math]\varepsilon[/math]. When [math]\varepsilon\sim N\left(0,\sigma^{2}\right)[/math], then [math]\widehat{\beta}_{OLS}[/math] attains the Cramer-Rao lower bound: In this case, [math]\widehat{\beta}_{OLS}[/math] is the best unbiased estimator (BUE), i.e., even non-linear estimators cannot be more efficient than OLS.