Difference between revisions of "Lecture 18. B) Partitioned Regression"

From Significant Statistics
Jump to navigation Jump to search
m
(Partitioned Regression)
 
Line 55: Line 55:
 
\Leftrightarrow & X_{2}^{'}X_{1}\beta_{1}+X_{2}^{'}X_{1}\left(X_{1}^{'}X_{1}\right)^{-1}X_{1}^{'}X_{2}\beta_{2}=X_{2}^{'}X_{1}\left(X_{1}^{'}X_{1}\right)^{-1}X_{1}^{'}y\end{aligned}</math>
 
\Leftrightarrow & X_{2}^{'}X_{1}\beta_{1}+X_{2}^{'}X_{1}\left(X_{1}^{'}X_{1}\right)^{-1}X_{1}^{'}X_{2}\beta_{2}=X_{2}^{'}X_{1}\left(X_{1}^{'}X_{1}\right)^{-1}X_{1}^{'}y\end{aligned}</math>
  
Removing the last equation from equation (3) yields:
+
Removing the last equation from equation (2) yields:
  
 
<math display="block">\left(X_{2}^{'}X_{2}-X_{2}^{'}X_{1}\left(X_{1}^{'}X_{1}\right)^{-1}X_{1}^{'}X_{2}\right)\beta_{2}=\left[X_{2}^{'}-X_{2}^{'}X_{1}\left(X_{1}^{'}X_{1}\right)^{-1}X_{1}^{'}\right]y</math>
 
<math display="block">\left(X_{2}^{'}X_{2}-X_{2}^{'}X_{1}\left(X_{1}^{'}X_{1}\right)^{-1}X_{1}^{'}X_{2}\right)\beta_{2}=\left[X_{2}^{'}-X_{2}^{'}X_{1}\left(X_{1}^{'}X_{1}\right)^{-1}X_{1}^{'}\right]y</math>
Line 89: Line 89:
 
<math display="block">\widehat{\beta_{2}}=\left[X_{2}^{'}\left(I-P_{1}\right)X_{2}\right]^{-1}X_{2}^{'}\left(I-P_{1}\right)y</math>
 
<math display="block">\widehat{\beta_{2}}=\left[X_{2}^{'}\left(I-P_{1}\right)X_{2}\right]^{-1}X_{2}^{'}\left(I-P_{1}\right)y</math>
  
can be interpreted as rewritten as
+
can be rewritten as
  
 
<math display="block">\widehat{\beta_{2}}=\left[X_{2}^{*'}X_{2}^{*}\right]^{-1}X_{2}^{*'}y^{*}</math>
 
<math display="block">\widehat{\beta_{2}}=\left[X_{2}^{*'}X_{2}^{*}\right]^{-1}X_{2}^{*'}y^{*}</math>

Latest revision as of 15:02, 4 December 2019


Partitioned Regression

Partitioned regression is a method to understand how some parameters in OLS depend on others. Consider the decomposition of the linear regression equation

[math]\begin{aligned} & y=X\beta+\varepsilon\\ \Leftrightarrow & y=\left[\begin{array}{cc} X_{1} & X_{2}\end{array}\right]\left[\begin{array}{c} \beta_{1}\\ \beta_{2} \end{array}\right]+\varepsilon\end{aligned}[/math]

Where we are effectively partitioning the parameter vector [math]\beta[/math] into two sub-vectors [math]\beta_{1}[/math] and [math]\beta_{2}[/math].

What is [math]\widehat{\beta}_{1}[/math]?

Starting with the OLS normal equation,

[math]\begin{aligned} & X^{'}X\beta=X^{'}y\\ \Leftrightarrow & \left[\begin{array}{c} X_{1}^{'}\\ X_{2}^{'} \end{array}\right]\left[\begin{array}{cc} X_{1} & X_{2}\end{array}\right]\left[\begin{array}{c} \beta_{1}\\ \beta_{2} \end{array}\right]=\left[\begin{array}{c} X_{1}^{'}\\ X_{2}^{'} \end{array}\right]y\\ \Leftrightarrow & \left[\begin{array}{cc} X_{1}^{'}X_{1} & X_{1}^{'}X_{2}\\ X_{2}^{'}X_{1} & X_{2}^{'}X_{2} \end{array}\right]\left[\begin{array}{c} \beta_{1}\\ \beta_{2} \end{array}\right]=\left[\begin{array}{c} X_{1}^{'}\\ X_{2}^{'} \end{array}\right]y\\ \Leftrightarrow & \left\{ \begin{array}{c} X_{1}^{'}X_{1}\beta_{1}+X_{1}^{'}X_{2}\beta_{2}=X_{1}^{'}y\\ X_{2}^{'}X_{1}\beta_{1}+X_{2}^{'}X_{2}\beta_{2}=X_{2}^{'}y \end{array}\right.\end{aligned}[/math]

Define the equations immediately above as (1) and (2). Now, premultiply the first equation by [math]X_{2}^{'}X_{1}\left(X_{1}^{'}X_{1}\right)^{-1}[/math], to obtain

[math]\begin{aligned} & X_{2}^{'}X_{1}\left(X_{1}^{'}X_{1}\right)^{-1}X_{1}^{'}X_{1}\beta_{1}+X_{2}^{'}X_{1}\left(X_{1}^{'}X_{1}\right)^{-1}X_{1}^{'}X_{2}\beta_{2}=X_{2}^{'}X_{1}\left(X_{1}^{'}X_{1}\right)^{-1}X_{1}^{'}y\\ \Leftrightarrow & X_{2}^{'}X_{1}\beta_{1}+X_{2}^{'}X_{1}\left(X_{1}^{'}X_{1}\right)^{-1}X_{1}^{'}X_{2}\beta_{2}=X_{2}^{'}X_{1}\left(X_{1}^{'}X_{1}\right)^{-1}X_{1}^{'}y\end{aligned}[/math]

Removing the last equation from equation (2) yields:

[math]\left(X_{2}^{'}X_{2}-X_{2}^{'}X_{1}\left(X_{1}^{'}X_{1}\right)^{-1}X_{1}^{'}X_{2}\right)\beta_{2}=\left[X_{2}^{'}-X_{2}^{'}X_{1}\left(X_{1}^{'}X_{1}\right)^{-1}X_{1}^{'}\right]y[/math]

Now, let [math]P_{1}=X_{1}\left(X_{1}^{'}X_{1}\right)^{-1}X_{1}^{'}[/math], to get

[math]\begin{aligned} & X_{2}^{'}\left(I-P_{1}\right)X_{2}\beta_{2}=X_{2}^{'}\left(I-P_{1}\right)y\\ \Leftrightarrow & \widehat{\beta_{2}}=\left[X_{2}^{'}\left(I-P_{1}\right)X_{2}\right]^{-1}X_{2}^{'}\left(I-P_{1}\right)y\end{aligned}[/math]

In order to interpret this equation, we need to understand the meaning of matrix [math]P_{1}[/math]. In linear algebra, this matrix is called a projection matrix.

Projections

Let [math]P_{X}=X\left(X^{'}X\right)^{-1}X^{'}[/math]. When multiplied by a vector, matrix [math]P_{X}[/math] yields another vector that can be obtained by a weighted sum of vectors in [math]X[/math]. Consider the following representation, which applies to the case where [math]N=3[/math] and [math]K=2[/math].

Proj.png

When multiplied by vector [math]y[/math], matrix [math]P_{x}[/math] yields vector [math]P_{x}y[/math], which lives in the column space of [math]X[/math] . This column space is the space defined by the vectors defined in the columns of [math]X[/math]. Any vector in [math]Col\left(X\right)[/math] can be obtained by weighted sums of the vectors in the column space of [math]X[/math]. In fact, notice that [math]P_{X}y=X\left(X^{'}X\right)^{-1}X^{'}y=X\widehat{\beta}_{OLS}[/math], i.e., it is the OLS prediction of [math]y[/math].

As for matrix [math]I-P_{X}[/math] , this matrix produces a vector that is orthogonal to the column space of [math]X[/math]. In fact, it is given by the vertical dashed vector in the figure above. Notice that

[math]\left(I-P_{X}\right)y=y-\widehat{y}=\widehat{\varepsilon},[/math]

i.e., this matrix produces the vector of estimated residuals, which is orthogonal (in the geometric sense) to the column space of [math]X[/math].

Projections are symmetric and idempotent, the last term meaning that repeated self-multiplication always yields the projection matrix itself.

Partitioned Regression (cont.)

With the knowledge of projection matrices, equation

[math]\widehat{\beta_{2}}=\left[X_{2}^{'}\left(I-P_{1}\right)X_{2}\right]^{-1}X_{2}^{'}\left(I-P_{1}\right)y[/math]

can be rewritten as

[math]\widehat{\beta_{2}}=\left[X_{2}^{*'}X_{2}^{*}\right]^{-1}X_{2}^{*'}y^{*}[/math]

where

[math]X_{2}^{*}=\left(I-P_{1}\right)X_{2}[/math] and [math]y^{*}=\left(I-P_{1}\right)y[/math] (notice that we are using the idempotence property).

Notice that [math]y^{*}=\left(I-P_{1}\right)y[/math] are the residuals from regressing [math]y[/math] on [math]X_{1}[/math], and [math]X_{2}^{*}=\left(I-P_{1}\right)X_{2}[/math] are the residuals from regressing each of the variables in [math]X_{2}[/math] on [math]X_{1}[/math]. Finally, [math]\widehat{\beta_{2}}[/math] is obtained from regressing the residuals [math]y^{*}[/math] on [math]X_{2}^{*}[/math].

More than a clarification of how OLS operates, partitioned regression can be used to inform the variance of two-stage estimators in which estimation requires plugging in first stage estimates into a second stage where additional estimates are produced. It can also be used to inform variable selection problems.