# Finding UMVU estimators

In the previous lecture, we have introduced Rao-Blackwell’s theorem, which can be used to reduce the variance of an existing estimator while preserving its mean. This immediately implies that UMVU estimators need to be based on sufficient statistics. Otherwise, one could Rao-Blackwellize such estimators - in the limit, by using the whole sample as a sufficient statistic - to obtain a more efficient estimator.

While the Rao-Blackwell theorem is useful to find a more efficient estimator, we are still to discover a method to produce an UMVU estimator. It turns out that Rao-Blackwellization can be used to produce the unique UMVU under certain conditions. These are defined in the Lehmann-Scheffé Theorem.

## Lehmann-Scheffé Theorem

Let $T$ be a sufficient and complete statistic for $\theta$. Then, if $\widehat{\theta}$ is unbiased,

$\widetilde{\theta}=E_{\theta}\left(E\left(\left.\widehat{\theta}\right|T\right)\right)$ is the unique UMVU.

We will define what it means for a statistic to be complete soon.

A question that immediately arises is whether UMVU estimators are always unique. The answer is yes, they are. This can be shown by contradiction, because if there existed (for example) two different UMVU estimators, the variance of their arithmetic mean can be calculated, and is lower than each individual variance.

The Lehmann-Scheffé Theorem shows that Rao-Blackwellization based on a sufficient and complete statistic of an unbiased estimator provides the UMVU.

The intuition is as follows: If one can ensure that Rao-Blackwellization based on a given type of statistic always yields the same estimator, then it must the UMVU, because for any other UMVU candidate, we could always Rao-Blackwellize it and obtain a unique estimator, whose variance will not be higher than our candidate's.

# Complete Statistic

A statistic $T$ is complete for $\theta\in\Theta$ if for any (measurable) function $g\left(\cdot\right)$, if

$E_{\theta}\left(g\left(T\right)\right)=0,\,\forall\theta\in\Theta$

then

$P_{\theta}\left(g\left(T\right)=0\right)=1,\,\forall\theta\in\Theta$

where

$P_{\theta}\left(\cdot\right)$ is a probability function parameterized by $\theta$. In other words, if $T$ is complete, then the expectation of $g\left(T\right)$ equals zero only if $g\left(T\right)=0$ almost everywhere.

The intuition of how the Lehmann Scheffé theorem produces the UMVU through Rao-Blackwellization is the following: Suppose we have two unbiased estimators obtained through Rao-Blackwellization via a sufficient and complete statistic $T$, $w_{1}\left(T\right)$ and $w_{2}\left(T\right)$. Because both estimators are unbiased, and because they are both based on a complete statistic $T$, they are the same.

Formally, $E_{\theta}\left(\underset{g\left(T\right)}{\underbrace{w_{1}\left(T\right)-w_{2}\left(T\right)}}\right)=0$ implies uniqueness, via $P_{\theta}\left(g\left(T\right)=0\right)=1,\,\forall\theta\in\Theta$. So, completeness of $T$ implies that two unbiased estimators $w_{1}\left(T\right)$ and $w_{2}\left(T\right)$ are actually the same estimator.

In other words, there exists only one unbiased estimator that is based on a complete statistic.

While complete statistics may fail to exist, we have learned that when they do exist, they can be used through Rao-Blackwellization to produce the UMVU.

# Example: Uniform

Let us find the UMVU for the upper limit of the uniform distribution.

Suppose $X_{i}\overset{iid}{\sim}U\left(0,\theta\right)$, where $\theta$ is unknown. We have shown that $X_{\left(n\right)}$ is a sufficient statistic. Now, we show that it is complete:

$E_{\theta}\left(g\left(X_{\left(n\right)}\right)\right)=\int_{0}^{\theta}g\left(t\right)f_{\left.X_{\left(n\right)}\right|\theta}\left(t\right)dt$ where

$f_{\left.X_{\left(n\right)}\right|\theta}\left(t\right)=\left(\left[F_{X}\left(t\right)\right]^{n}\right)^{'}=\left(\left[\frac{t}{\theta}\right]^{n}\right)^{'}=\frac{n}{\theta}\left(\frac{t}{\theta}\right)^{n-1}$ for $X_{\left(n\right)}\in\left[0,\theta\right]$,

so that

\begin{aligned} \int_{0}^{\theta}g\left(t\right)f_{\left.X_{\left(n\right)}\right|\theta}\left(t\right)dt & =\int_{0}^{\theta}g\left(t\right)\frac{n}{\theta}\left(\frac{t}{\theta}\right)^{n-1}dt\\ & =\frac{n}{\theta^{n}}\int_{0}^{\theta}g\left(t\right)t^{n-1}dt\end{aligned}.

For any $n\gt 1$, if the expectation equals zero, then $g\left(t\right)$ equals zero with probability one. Hence, statistic $X_{\left(n\right)}$ is complete.

Now, we require an unbiased estimator of $\theta$. Let us calculate the expectation of $X_{\left(n\right)}$:

\begin{aligned} E_{\theta}\left(X_{\left(n\right)}\right) & =\frac{n}{\theta^{n}}\int_{0}^{\theta}t\left(t\right)^{n-1}dt\\ & =\frac{n}{\theta^{n}}\int_{0}^{\theta}t^{n}dt\\ & =\frac{n}{\theta^{n}}\left.\frac{t^{n+1}}{n+1}\right|_{0}^{\theta}\\ & =\frac{n}{\theta^{n}}\left(\frac{\theta^{n+1}}{1+n}-0\right)\\ & =\frac{n}{1+n}\theta\end{aligned}

As it turns out, the estimator $X_{\left(n\right)}$ is biased. The form of the bias is convenient, though: We can pre-multiply by $\frac{1+n}{n}$ to obtain an unbiased estimator. Specifically, $E_{\theta}\left(\widetilde{\theta}\right)=E_{\theta}\left(\frac{1+n}{n}X_{\left(n\right)}\right)=\theta$.

Notice that $\widetilde{\theta}$ is already a function of the complete and sufficient statistic $X_{\left(n\right)}$, such that Rao-Blackwellizing it will simply deliver $\widetilde{\theta}$ again. Hence, we know that $\widetilde{\theta}$ is the UMVU of $\theta$.

Interestingly, $\widetilde{\theta}$ is neither the maximum likelihood nor the method of moments estimator of $\theta$. It is, however, a bias corrected version of the maximum likelihood estimator, which in contrast can overshoot the true value of $\theta$.

# Exponential Family

It turns out that the exponential family provides a direct way to derive sufficient and complete statistics.

Recall: A family $\left\{ f\left(\left.\cdot\right|\theta\right):\theta\in\Theta\right\}$ of pmfs/pdfs is an exponential family if $f\left(\left.x\right|\theta\right)=h\left(x\right)c\left(\theta\right)\exp\left\{ \sum_{i=1}^{K}\omega_{i}\left(\theta\right)t_{i}\left(x\right)\right\} ,\,x\in\mathbb{R},\,\theta\in\Theta$

The following holds:

• $T\left(X_{1}..X_{n}\right)=\sum_{i=1}^{n}\left(t_{1}\left(X_{i}\right),...,T_{K}\left(X_{i}\right)\right)$ is sufficient for $\theta$.
• $T$ is complete if $\left\{ \left(\omega_{1}\left(\theta\right),...,\omega_{K}\left(\theta\right)\right)^{'},\theta\in\Theta\right\}$ contains an open set in $\mathbb{R}^{K}$.

The first condition is simple: The statistic comprised by the collection of $t_{i}$ summed over $X_{i}$ is sufficient for the parameter vector $\theta$.

The second condition is also simple, but requires that functions $\left(\omega_{1}\left(\theta\right)..\omega_{K}\left(\theta\right)\right)^{'}$ contains an open set in $\mathbb{R}^{K}$.

For example, if these turned out to be $\left(\mu,\mu^{2}\right)^{'}$ for the normal distribution, then while $\mu\in\left(-\infty,\infty\right)$, it does not span an open set in $\mathbb{R}^{2}$, and so $T$ is not complete.

# Example: Bernoulli

Suppose $X_{i}\overset{iid}{\sim}Ber\left(p\right)$, where

$p\in\left(0,1\right)$ is unknown.

Its marginal pmf is

$f\left(\left.x\right|p\right)=p^{x}\left(1-p\right)^{1-x}1\left(x\in\left\{ 0,1\right\} \right)$

such that

$h\left(x\right)=1\left(x\in\left\{ 0,1\right\} \right);c\left(p\right)=1-p;\omega\left(p\right)=\log\left(\frac{p}{1-p}\right);t\left(x\right)=x$

From the first property, we have

$\Sigma_{i=1}^{n}x_{i}=\Sigma_{i=1}^{n}t\left(x_{i}\right)$ is a sufficient statistic (i.e., the total number of 1s).

For completeness, notice that we have

$\left\{ \omega\left(p\right):p\in\left(0,1\right)\right\} =\left\{ \log\left(\frac{p}{1-p}\right):p\in\left(0,1\right)\right\} =\left\{ \log\left(r\right):r\in\left(0,+\infty\right)\right\}$,

where $\left(0,+\infty\right)$ contains an open interval in $\mathbb{R}$.

Hence, $\Sigma_{i=1}^{n}x_{i}$ is complete.

Finally, $\widehat{p}_{ML}=\widehat{p}_{MM}=\frac{\sum_{i=1}^{n}X_{i}}{n}$ is UMVU.

# Cramér-Rao Lower Bound (CRLB)

It is possible to provide a meaningful lower bound to the variance of an estimator. (An example of a meaningless bound is zero.)

Let $X_{1}..X_{n}$ be a random sample from a distribution with marginal pmf/pdf - i.e., of a single observation - $f\left(\left.\cdot\right|\theta\right)$.

Under some regularity conditions (finite variance of the estimator and differentiation under the integral sign is allowed),

$Var_{\theta}\left(\widehat{\theta}\right)\geq\frac{\left(\frac{d}{d\theta}E_{\theta}\left(\widehat{\theta}\left(X_{i}\right)\right)\right)^{2}}{nE_{\theta}\left[\left(\frac{\partial}{\partial\theta}\log\,f\left(\left.X_{i}\right|\theta\right)\right)^{2}\right]}=\frac{\left(\frac{d}{d\theta}E_{\theta}\left(\widehat{\theta}\left(X_{i}\right)\right)\right)^{2}}{nVar_{\theta}\left[\frac{\partial}{\partial\theta}\log\,f\left(\left.X_{i}\right|\theta\right)\right]},\,\forall\theta\in\Theta.$

This is the version for the scalar case, but the analogue multivariate case exists as well.

Notice that when $\widehat{\theta}$ is unbiased, we obtained the simplified inequality

$Var_{\theta}\left(\widehat{\theta}\right)\geq\frac{1}{nE_{\theta}\left[\left(\frac{\partial}{\partial\theta}\log\,f\left(\left.X_{i}\right|\theta\right)\right)^{2}\right]}=\frac{1}{nVar_{\theta}\left[\frac{\partial}{\partial\theta}\log\,f\left(\left.X_{i}\right|\theta\right)\right]},\,\forall\theta\in\Theta.$

This result presents a few striking features.

First, the log-likelihood function shows up in the denominator.

Second, it is not evaluated at some point $x_{i}$. Rather, the expectation over $X_{i}$ of its derivative w.r.t. $\theta$ is taken.

Third, the last equality follows from a novel result (which we do not prove here):

$Var_{\theta}\left[\frac{\partial}{\partial\theta}\log\,f\left(\left.X_{i}\right|\theta\right)\right]=E_{\theta}\left[\left(\frac{\partial}{\partial\theta}\log\,f\left(\left.X_{i}\right|\theta\right)\right)^{2}\right]$.

This is a special property of the log-likelihood function, since $E_{\theta}\left[\left(\frac{\partial}{\partial\theta}\log\,f\left(\left.X_{i}\right|\theta\right)\right)\right]=0$.

## Fisher Information

We denote the denominator, $I\left(\theta\right)=nE_{\theta}\left[\left(\frac{\partial}{\partial\theta}\log\,f\left(\left.X_{i}\right|\theta\right)\right)^{2}\right]$ as the Fisher information, which is the reciprocal of the minimum attainable variance of unbiased estimators.

## CRLB: Possible Cases

The CRLB is a weak bound, in the sense that an UMVU may fail to reach it.

Three possible cases can occur:

• The CRLB is applicable and attainable:
• Estimating $p$ when $X_{i}\sim Ber\left(p\right)$
• Estimating $\mu$ when $X_{i}\sim N\left(\mu,\sigma^{2}\right)$ with $\sigma^{2}$ known.
• The CRLB is applicable, but not attainable:
• Estimating $\sigma^{2}$ when $X_{i}\sim N\left(\mu,\sigma^{2}\right)$: $\widehat{\sigma^{2}}=s^{2},$ while $Var_{\sigma^{2}}\left(s^{2}\right)=\frac{2\sigma^{4}}{n-1}\gt \frac{2\sigma^{4}}{n}$, the latter of which is the CRLB.
• The CRLB is not applicable:
• Estimating $\theta$ when $X_{i}\sim U\left(0,\theta\right)$; $Var_{\theta}\left(\widehat{\theta}_{UMVU}\right)=\frac{1}{n\left(n+2\right)}\theta^{2}$, yet CRLB$=\infty$ or $\frac{\theta^{2}}{n}$.