Lecture 17. A) Ordinary Least Squares

From Significant Statistics
Jump to navigation Jump to search
.

Ordinary Least Squares

Suppose we have some data [math]\left\{ x_{i},y_{i}\right\} _{i=1}^{N}[/math]. We would like to relate it through a line, i.e.,

An intuitive estimator minimizes the distance between [math]y_{i}[/math] and [math]\beta_{0}+\beta_{1}x_{i}[/math], for example,

[math]\min_{\beta_{0},\beta_{1}}\,\sum_{i=1}^{n}\left(y_{i}-\beta_{0}-\beta_{1}x_{i}\right)^{2}[/math]

The quadratic distance is especially tractable, hence its use. Calculating the first order conditions,

[math]\begin{aligned} & \left\{ \begin{array}{c} foc\left(\beta_{0}\right):\,\sum_{i=1}^{n}-2\left(y_{i}-\beta_{0}-\beta_{1}x_{i}\right)=0\\ foc\left(\beta_{1}\right):\,\sum_{i=1}^{n}-2x_{i}\left(y_{i}-\beta_{0}-\beta_{1}x_{i}\right)=0 \end{array}\right.\\ \Leftrightarrow & \left\{ \begin{array}{c} \beta_{0}=\frac{\sum y_{i}}{n}-\beta_{1}\frac{\sum x_{i}}{n}=\overline{y}-\beta_{1}\overline{x}\\ \sum x_{i}y_{i}-\beta_{1}\sum x_{i}^{2}-n\overline{x}\beta_{0}=0 \end{array}\right.\\ \Leftrightarrow & \left\{ \begin{array}{c} \\ \sum x_{i}y_{i}-\beta_{1}\sum x_{i}^{2}-n\overline{x}\left(\overline{y}-\beta_{1}\overline{x}\right)=0 \end{array}\right.\\ \Leftrightarrow & \left\{ \begin{array}{c} \\ \beta_{1}=\frac{\sum x_{i}y_{i}-n\overline{x}\overline{y}}{\sum x_{i}^{2}-n\overline{x}^{2}}=\frac{\sum\left(x_{i}-\overline{x}\right)\left(y_{i}-\overline{y}\right)}{\sum\left(x_{i}-\overline{x}\right)^{2}} \end{array}.\right.\end{aligned}[/math]

So, we have learned that

[math]\widehat{\beta_{0}}^{OLS}=\overline{y}-\beta_{1}\overline{x}.[/math]

[math]\widehat{\beta_{1}}^{OLS}=\frac{Cov\left(x_{i},y_{i}\right)}{Var\left(x_{i}\right)}.[/math]

The expression of the slope parameter is interesting: It represents the fraction of the variation in [math]x_{i}[/math] that covaries with [math]y_{i}[/math].

Some Remarks

  • After estimating [math]\beta[/math], we can predict [math]y_{i}[/math] via

[math]\widehat{y_{i}}=\widehat{\beta_{0}}^{OLS}+\widehat{\beta_{1}}^{OLS}x_{i}[/math]

  • We can also define the prediction errors as [math]\widehat{\varepsilon_{i}}=y_{i}-\widehat{y_{i}}[/math], which we can estimate according to

[math]\begin{aligned} \widehat{\varepsilon}_{i} & =y_{i}-\widehat{y_{i}}\\ & =y_{i}-\left(\widehat{\beta_{0}}^{OLS}+\widehat{\beta_{1}}^{OLS}x_{i}\right)\end{aligned}[/math]

These estimated errors provide the vertical distance between our estimation line and the height of each data point, indexed by [math]i[/math].

  • Notice also that we could estimate the sample variance of the errors, [math]Var\left(\widehat{\varepsilon_{i}}\right)[/math]. This statistic provides a notion of how far the estimated line is from each data point.