Evaluating Estimators

A good estimator of [math]\theta[/math] is close to [math]\theta[/math] in some probabilistic sense. For reasons of convenience, the leading criterion is the mean squared error:

The mean squared error (MSE) of an estimator [math]\widehat{\theta}[/math] of [math]\theta\in\Theta\subseteq\mathbb{R}[/math] is a function (of [math]\widehat{\theta}[/math]) given by

[math]MSE_{\theta}\left(\widehat{\theta}\right)=E_{\theta}\left[\left(\theta-\widehat{\theta}\right)^{2}\right][/math]

Where from here on, we use notation [math]E_{\theta}\left[\cdot\right]=E_{\theta}\left[\left.\cdot\right|\theta\right][/math], that is, the subscript indicates the variable to be conditioned on (before, it used to mean the variable of integration). So,

[math]MSE_{\theta}\left(\widehat{\theta}\right)=E_{\theta}\left[\left(\theta-\widehat{\theta}\right)^{2}\right]=E\left[\left.\left(\theta-\widehat{\theta}\right)^{2}\right|\theta\right].[/math]

The interpretation is that MSE gives us the expected quadratic difference between our estimator and a specific value of [math]\theta[/math], which we usually assume to be some true value.

MSE is mostly popular due to its tractability. When [math]\theta[/math] is a vector of parameters, we employ the vector version instead:

[math]MSE_{\theta}\left(\widehat{\theta}\right)=E_{\theta}\left[\left(\theta-\widehat{\theta}\right).\left(\theta-\widehat{\theta}\right)^{'}\right].[/math]

The vector version of the MSE produces a matrix. Two compare 2 estimator vectors, we compare these matrices. Namely, we say an MSE is lower than another if the difference of the matrices is positive semi-definite (i.e., [math]z'.M.z\geq0,\,\forall z\neq0[/math]). We will confine ourselves to the scalar case most of the time.

If [math]MSE_{\theta}\left(\widehat{\theta}_{1}\right)\gt MSE_{\theta}\left(\widehat{\theta}_{2}\right)[/math] for all values of [math]\theta[/math], we are tempted to say that [math]\widehat{\theta}_{2}[/math] is better, since it is on average closer to [math]\theta[/math], whatever value it has. However, we may feel different if [math]\widehat{\theta}_{2}[/math] systematically underestimates (or overestimates) [math]\theta[/math], where as [math]\widehat{\theta}_{1}[/math] is on average correct.

In order to take this into account, we introduce the concept of bias:

[math]Bias_{\theta}\left(\widehat{\theta}\right)=E_{\theta}\left(\theta-\widehat{\theta}\right)[/math]

Whenever [math]E_{\theta}\left(\theta-\widehat{\theta}\right)=0[/math] - or equivalently, [math]E_{\theta}\left(\widehat{\theta}\right)=\theta[/math] - we say estimator [math]\widehat{\theta}[/math] is unbiased.

What follows is fundamental result about the decomposition of the MSE:

[math]MSE_{\theta}\left(\widehat{\theta}\right)=Var_{\theta}\left(\widehat{\theta}\right)+Bias_{\theta}\left(\widehat{\theta}\right)^{2}[/math]

This means that, for estimators with a given MSE, for example, there is a tradeoff between bias and variance.

The proof of the result is obtained by adding and subtracting [math]E_{\theta}\left(\widehat{\theta}\right)[/math]:

[math]\begin{aligned} MSE_{\theta}\left(\widehat{\theta}\right)=E_{\theta}\left[\left(\theta-\widehat{\theta}\right)^{2}\right] & =E_{\theta}\left[\left(\theta-\widehat{\theta}+E_{\theta}\left(\widehat{\theta}\right)-E_{\theta}\left(\widehat{\theta}\right)\right)^{2}\right]\\ & =E_{\theta}\left[\left(\widehat{\theta}-E_{\theta}\left(\widehat{\theta}\right)\right)^{2}\right]+\underset{=\left(\theta-E_{\theta}\left(\widehat{\theta}\right)\right)^{2}}{\underbrace{E_{\theta}\left[\left(E_{\theta}\left(\widehat{\theta}\right)-\theta\right)^{2}\right]}}+\underset{=0}{\underbrace{...}}\\ & =Var_{\theta}\left(\widehat{\theta}\right)+Bias_{\theta}\left(\widehat{\theta}\right)^{2}\end{aligned}[/math]

We now define an efficient estimator:

Let [math]W[/math] be a collection of estimators of [math]\theta\in\Theta[/math]. An estimator [math]\widehat{\theta}[/math] is efficient relative to [math]W[/math] if

[math]MSE_{\theta}\left(\widehat{\theta}\right)\leq MSE_{\theta}\left(w\right),\,\forall\theta\in\Theta,\,\forall w\in W[/math].

In order to find a “best” estimator, we have to restrict [math]W[/math] in some way (otherwise, we can often find many estimators with equal MSE, by exploiting the bias/variance tradeoff).

<< Point Estimation (cont.)

Minimum Variance Estimators >>

Lecture 9. B) Evaluating Estimators

Evaluating Estimators

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Tools