Lecture 10. B) Complete Statistic
Complete Statistic
A statistic [math]T[/math] is complete for [math]\theta\in\Theta[/math] if for any (measurable) function [math]g\left(\cdot\right)[/math],
if
[math]E_{\theta}\left(g\left(T\right)\right)=0,\,\forall\theta\in\Theta[/math]
then
[math]P_{\theta}\left(g\left(T\right)=0\right)=1,\,\forall\theta\in\Theta[/math]
where
[math]P_{\theta}\left(\cdot\right)[/math] is a probability function parameterized by [math]\theta[/math]. In other words, if [math]T[/math] is complete, then the expectation of [math]g\left(T\right)[/math] equals zero only if [math]g\left(T\right)=0[/math] almost everywhere.
The intuition of how the Lehmann Scheffé theorem produces the UMVU through Rao-Blackwellization is the following: Suppose we have two unbiased estimators obtained through Rao-Blackwellization via a sufficient and complete statistic [math]T[/math], [math]w_{1}\left(T\right)[/math] and [math]w_{2}\left(T\right)[/math]. Because both estimators are unbiased, and because they are both based on a complete statistic [math]T[/math], they are the same.
Formally, [math]E_{\theta}\left(\underset{g\left(T\right)}{\underbrace{w_{1}\left(T\right)-w_{2}\left(T\right)}}\right)=0[/math] implies uniqueness, via [math]P_{\theta}\left(g\left(T\right)=0\right)=1,\,\forall\theta\in\Theta[/math]. So, completeness of [math]T[/math] implies that two unbiased estimators [math]w_{1}\left(T\right)[/math] and [math]w_{2}\left(T\right)[/math] are actually the same estimator.
In other words, there exists only one unbiased estimator that is based on a complete statistic.
While complete statistics may fail to exist, we have learned that when they do exist, they can be used through Rao-Blackwellization to produce the UMVU.
Example: Uniform
Suppose [math]X_{i}\overset{iid}{\sim}U\left(0,\theta\right)[/math] where [math]\theta[/math] is unknown. We have shown that [math]X_{\left(n\right)}[/math] is a sufficient statistic. Now we show that it is complete, by calculating
[math]E_{\theta}\left(g\left(X_{\left(n\right)}\right)\right)[/math]
and showing that if it equals zero for all values of [math]\theta\in\Theta[/math], then
[math]P_{\theta}\left(g\left(X_{\left(n\right)}\right)=0\right)=1.[/math]
We proceeded by deriving the pdf of [math]X_{\left(n\right)}[/math], [math]f_{X_{\left(n\right)}}=\frac{n}{\theta}\left(\frac{x}{\theta}\right)^{n-1}[/math] and calculating [math]E_{\theta}\left(g\left(X_{\left(n\right)}\right)\right):[/math]
[math]\begin{aligned} E_{\theta}\left(g\left(X_{\left(n\right)}\right)\right) & =0\\ \Leftrightarrow\int_{0}^{\theta}g\left(x\right)f_{X_{\left(n\right)}}\left(x\right)dx & =0\\ \Leftrightarrow\int_{0}^{\theta}g\left(x\right)\frac{n}{\theta}\left(\frac{x}{\theta}\right)^{n-1}dx & =0\end{aligned}[/math]
It is not clear whether the integral above implies that [math]g\left(x\right)=0[/math] almost everywhere. For one, if [math]g\left(x\right)[/math] is allowed to be positive and negative in different regions, the areas under the curve could offset to yield an integral that equals zero. If this is holds, the statistic is not complete.
A typical approach to tackle this problem is to differentiate both sides of the equation above:
[math]\begin{aligned} & E_{\theta}\left(g\left(X_{\left(n\right)}\right)\right)=0,\,\forall\theta\in\Theta\\ \Leftrightarrow & \frac{d}{d\theta}E_{\theta}\left(g\left(X_{\left(n\right)}\right)\right)=\frac{d}{d\theta}0\\ \Leftrightarrow & \frac{d}{d\theta}E_{\theta}\left(g\left(X_{\left(n\right)}\right)\right)=0\end{aligned}[/math]
Why can we differentiate both sides? This is clearly not always correct. For example, consider equation [math]x^{2}=5.[/math] Taking derivatives on both sides would yield [math]2x=0,[/math] which is not consistent with the initial equation.
The reason we can differentiate equation [math]E_{\theta}\left(g\left(X_{\left(n\right)}\right)\right)=0,\,\forall\theta\in\Theta[/math] on both sides is that the identity holds for all values of [math]\theta[/math]. This type of equation is called a functional equation: It applies to all the values of the domain of function [math]E_{\theta}\left(g\left(X_{\left(n\right)}\right)\right)[/math].
Continuing the derivation, we obtain:
[math]\begin{aligned} & \frac{d}{d\theta}E_{\theta}\left(g\left(X_{\left(n\right)}\right)\right)=0\\ \Leftrightarrow & \frac{d}{d\theta}\left(\int_{0}^{\theta}g\left(x\right)\frac{n}{\theta}\left(\frac{x}{\theta}\right)^{n-1}dx\right)=0\\ \Leftrightarrow & \frac{d}{d\theta}\left[\left(\frac{n}{\theta^{n}}\right)\int_{0}^{\theta}g\left(x\right)x^{n-1}dx\right]=0\end{aligned}[/math]
Now, since each factor is a function of [math]\theta[/math], the derivative becomes
[math]\begin{aligned} & -\frac{n^{2}}{\theta^{n+1}}\int_{0}^{\theta}g\left(x\right)x^{n-1}dx+\frac{n}{\theta^{n}}g\left(\theta\right)\theta^{n-1}=0\\ \Leftrightarrow & \underset{=E_{\theta}\left(g\left(X_{\left(n\right)}\right)\right)=0}{-n\underbrace{\int_{0}^{\theta}g\left(x\right)\frac{n}{\theta}\left(\frac{x}{\theta}\right)^{n-1}dx}+\frac{n}{\theta^{n}}g\left(\theta\right)\theta^{n-1}}=0\\ \Leftrightarrow & \frac{n}{\theta}g\left(\theta\right)=0,\,\forall\theta\in\Theta\\ \Leftrightarrow & g\left(\theta\right)=0,\,\forall\theta\in\Theta\end{aligned}[/math]
The result above applies for all values [math]\theta\gt 0[/math]. Hence, we were able to show that the zero expectation of the statistic implies that [math]g\left(\theta\right)=0[/math], implying that [math]X_{\left(n\right)}[/math] is complete.
Exponential Family
It turns out that the exponential family provides a direct way to derive sufficient and complete statistics.
Recall: A family [math]\left\{ f\left(\left.\cdot\right|\theta\right):\theta\in\Theta\right\}[/math] of pmfs/pdfs is an exponential family if [math]f\left(\left.x\right|\theta\right)=h\left(x\right)c\left(\theta\right)\exp\left\{ \sum_{i=1}^{K}\omega_{i}\left(\theta\right)t_{i}\left(x\right)\right\} ,\,x\in\mathbb{R},\,\theta\in\Theta[/math]
The following holds:
- [math]T\left(X_{1}..X_{n}\right)=\sum_{i=1}^{n}\left(t_{1}\left(X_{i}\right),...,T_{K}\left(X_{i}\right)\right)[/math] is sufficient for [math]\theta[/math].
- [math]T[/math] is complete if [math]\left\{ \left(\omega_{1}\left(\theta\right),...,\omega_{K}\left(\theta\right)\right)^{'},\theta\in\Theta\right\}[/math] contains an open set in [math]\mathbb{R}^{K}[/math].
The first condition is simple: The statistic comprised by the collection of [math]t_{i}[/math] summed over [math]X_{i}[/math] is sufficient for the parameter vector [math]\theta[/math].
The second condition is also simple, but requires that functions [math]\left(\omega_{1}\left(\theta\right)..\omega_{K}\left(\theta\right)\right)^{'}[/math] contain an open set in [math]\mathbb{R}^{K}[/math].
For example, if these turned out to be [math]\left(\mu,\mu^{2}\right)^{'}[/math] for the normal distribution, then while [math]\mu\in\left(-\infty,\infty\right)[/math], it does not span an open set in [math]\mathbb{R}^{2}[/math], and so [math]T[/math] is not complete.
Example: Bernoulli
Suppose [math]X_{i}\overset{iid}{\sim}Ber\left(p\right)[/math], where
[math]p\in\left(0,1\right)[/math] is unknown.
Its marginal pmf is
[math]f\left(\left.x\right|p\right)=p^{x}\left(1-p\right)^{1-x}1\left(x\in\left\{ 0,1\right\} \right)[/math]
such that
[math]h\left(x\right)=1\left(x\in\left\{ 0,1\right\} \right);c\left(p\right)=1-p;\omega\left(p\right)=\log\left(\frac{p}{1-p}\right);t\left(x\right)=x[/math]
From the first property, we have
[math]\Sigma_{i=1}^{n}x_{i}=\Sigma_{i=1}^{n}t\left(x_{i}\right)[/math] is a sufficient statistic (i.e., the total number of 1s).
For completeness, notice that we have
[math]\left\{ \omega\left(p\right):p\in\left(0,1\right)\right\} =\left\{ \log\left(\frac{p}{1-p}\right):p\in\left(0,1\right)\right\} =\left\{ \log\left(r\right):r\in\left(0,+\infty\right)\right\}[/math],
where [math]\left(0,+\infty\right)[/math] contains an open interval in [math]\mathbb{R}[/math].
Hence, [math]\Sigma_{i=1}^{n}x_{i}[/math] is complete.
Finally, [math]\widehat{p}_{ML}=\widehat{p}_{MM}=\frac{\sum_{i=1}^{n}X_{i}}{n}[/math] is UMVU.