.

# Ordinary Least Squares

Suppose we have some data $\left\{ x_{i},y_{i}\right\} _{i=1}^{N}$. We would like to relate it through a line, i.e.,

$y_{i}=\beta_{0}+\beta_{1}x_{i}$

An intuitive estimator minimizes the distance between $y_{i}$ and $\beta_{0}+\beta_{1}x_{i}$, for example,

$\min_{\beta_{0},\beta_{1}}\,\sum_{i=1}^{n}\left(y_{i}-\beta_{0}-\beta_{1}x_{i}\right)^{2}$

The quadratic distance is especially tractable, hence its use. Calculating the first order conditions,

\begin{aligned} & \left\{ \begin{array}{c} foc\left(\beta_{0}\right):\,\sum_{i=1}^{n}-2\left(y_{i}-\beta_{0}-\beta_{1}x_{i}\right)=0\\ foc\left(\beta_{1}\right):\,\sum_{i=1}^{n}-2x_{i}\left(y_{i}-\beta_{0}-\beta_{1}x_{i}\right)=0 \end{array}\right.\\ \Leftrightarrow & \left\{ \begin{array}{c} \beta_{0}=\frac{\sum y_{i}}{n}-\beta_{1}\frac{\sum x_{i}}{n}=\overline{y}-\beta_{1}\overline{x}\\ \sum x_{i}y_{i}-\beta_{1}\sum x_{i}^{2}-n\overline{x}\beta_{0}=0 \end{array}\right.\\ \Leftrightarrow & \left\{ \begin{array}{c} \\ \sum x_{i}y_{i}-\beta_{1}\sum x_{i}^{2}-n\overline{x}\left(\overline{y}-\beta_{1}\overline{x}\right)=0 \end{array}\right.\\ \Leftrightarrow & \left\{ \begin{array}{c} \\ \beta_{1}=\frac{\sum x_{i}y_{i}-n\overline{x}\overline{y}}{\sum x_{i}^{2}-n\overline{x}^{2}}=\frac{\sum\left(x_{i}-\overline{x}\right)\left(y_{i}-\overline{y}\right)}{\sum\left(x_{i}-\overline{x}\right)^{2}} \end{array}.\right.\end{aligned}

So, we have learned that

$\widehat{\beta_{0}}^{OLS}=\overline{y}-\beta_{1}\overline{x}.$

$\widehat{\beta_{1}}^{OLS}=\frac{Cov\left(x_{i},y_{i}\right)}{Var\left(x_{i}\right)}.$

The expression of the slope parameter is interesting: It represents the fraction of the variation in $x_{i}$ that covaries with $y_{i}$.

## Some Remarks

• After estimating $\beta$, we can predict $y_{i}$ via

$\widehat{y_{i}}=\widehat{\beta_{0}}^{OLS}+\widehat{\beta_{1}}^{OLS}x_{i}$

• We can also define the prediction errors as $\widehat{\varepsilon_{i}}=y_{i}-\widehat{y_{i}}$, which we can estimate according to

\begin{aligned} \widehat{\varepsilon}_{i} & =y_{i}-\widehat{y_{i}}\\ & =y_{i}-\left(\widehat{\beta_{0}}^{OLS}+\widehat{\beta_{1}}^{OLS}x_{i}\right)\end{aligned}

These estimated errors provide the vertical distance between our estimation line and the height of each data point, indexed by $i$.

• Notice also that we could estimate the sample variance of the errors, $Var\left(\widehat{\varepsilon_{i}}\right)$. This statistic provides a notion of how far the estimated line is from each data point.