Lecture 17. A) Ordinary Least Squares
Ordinary Least Squares
Suppose we have some data [math]\left\{ x_{i},y_{i}\right\} _{i=1}^{N}[/math]. We would like to relate it through a line, i.e.,
[math]y_{i}=\beta_{0}+\beta_{1}x_{i}[/math]
An intuitive estimator minimizes the distance between [math]y_{i}[/math] and [math]\beta_{0}+\beta_{1}x_{i}[/math], for example,
[math]\min_{\beta_{0},\beta_{1}}\,\sum_{i=1}^{n}\left(y_{i}-\beta_{0}-\beta_{1}x_{i}\right)^{2}[/math]
The quadratic distance is especially tractable, hence its use. Calculating the first order conditions,
[math]\begin{aligned} & \left\{ \begin{array}{c} foc\left(\beta_{0}\right):\,\sum_{i=1}^{n}-2\left(y_{i}-\beta_{0}-\beta_{1}x_{i}\right)=0\\ foc\left(\beta_{1}\right):\,\sum_{i=1}^{n}-2x_{i}\left(y_{i}-\beta_{0}-\beta_{1}x_{i}\right)=0 \end{array}\right.\\ \Leftrightarrow & \left\{ \begin{array}{c} \beta_{0}=\frac{\sum y_{i}}{n}-\beta_{1}\frac{\sum x_{i}}{n}=\overline{y}-\beta_{1}\overline{x}\\ \sum x_{i}y_{i}-\beta_{1}\sum x_{i}^{2}-n\overline{x}\beta_{0}=0 \end{array}\right.\\ \Leftrightarrow & \left\{ \begin{array}{c} \\ \sum x_{i}y_{i}-\beta_{1}\sum x_{i}^{2}-n\overline{x}\left(\overline{y}-\beta_{1}\overline{x}\right)=0 \end{array}\right.\\ \Leftrightarrow & \left\{ \begin{array}{c} \\ \beta_{1}=\frac{\sum x_{i}y_{i}-n\overline{x}\overline{y}}{\sum x_{i}^{2}-n\overline{x}^{2}}=\frac{\sum\left(x_{i}-\overline{x}\right)\left(y_{i}-\overline{y}\right)}{\sum\left(x_{i}-\overline{x}\right)^{2}} \end{array}.\right.\end{aligned}[/math]
So, we have learned that
[math]\widehat{\beta_{0}}^{OLS}=\overline{y}-\beta_{1}\overline{x}.[/math]
[math]\widehat{\beta_{1}}^{OLS}=\frac{Cov\left(x_{i},y_{i}\right)}{Var\left(x_{i}\right)}.[/math]
The expression of the slope parameter is interesting: It represents the fraction of the variation in [math]x_{i}[/math] that covaries with [math]y_{i}[/math].
Some Remarks
- After estimating [math]\beta[/math], we can predict [math]y_{i}[/math] via
[math]\widehat{y_{i}}=\widehat{\beta_{0}}^{OLS}+\widehat{\beta_{1}}^{OLS}x_{i}[/math]
- We can also define the prediction errors as [math]\widehat{\varepsilon_{i}}=y_{i}-\widehat{y_{i}}[/math], which we can estimate according to
[math]\begin{aligned} \widehat{\varepsilon}_{i} & =y_{i}-\widehat{y_{i}}\\ & =y_{i}-\left(\widehat{\beta_{0}}^{OLS}+\widehat{\beta_{1}}^{OLS}x_{i}\right)\end{aligned}[/math]
These estimated errors provide the vertical distance between our estimation line and the height of each data point, indexed by [math]i[/math].
- Notice also that we could estimate the sample variance of the errors, [math]Var\left(\widehat{\varepsilon_{i}}\right)[/math]. This statistic provides a notion of how far the estimated line is from each data point.