Lecture 8. B) Method of Moments

From Significant Statistics
Jump to navigation Jump to search

Method of Moments

Example: Bernoulli

We start with an example. Suppose [math]X_{i}\sim Ber\left(p\right)[/math], where [math]p\in\left[0,1\right][/math] is unknown.

The method of moments operates by equaling sample moments to population moments. To estimate [math]p[/math], we might equal the sample moment [math]\overline{x}[/math] with the population moment [math]E\left(X\right)=p[/math]. In this case, we obtain the method of moments estimator, [math]\widehat{\theta}_{MM}=\widehat{p}_{MM}[/math] (we use [math]\widehat{}[/math] to refer to estimators) by writing the moment equation [math]E\left(X\right)=p[/math] by replacing the moment [math]E\left(X\right)[/math] by its sample analogue [math]\overline{x}[/math], and replacing parameter [math]p[/math] by the estimator [math]\widehat{p}_{MM}[/math] to obtain [math]\overline{x}=\widehat{p}_{MM}[/math].

In other words, the method of moments estimate for [math]p[/math] is the proportion of observed 1s ([math]\overline{x}[/math]) in the sample. Sometimes, the parameter of interest does not equal [math]E\left(X\right)[/math], in which case we would have to have solved the last equation w.r.t. [math]\widehat{p}_{MM}[/math].

Method of Moments

Let [math]X_{1}..X_{n}[/math] be a random sample from a distribution with pmf/pdf [math]f\left(\left.\cdot\right|\theta\right)[/math], where [math]\theta\subseteq\Theta\subseteq\mathbb{R}[/math] is unknown.

And let [math]\mu\left(\cdot\right):\Theta\rightarrow\mathbb{R}[/math] (aka the moment equation) be given by [math]\mu\left(\theta\right)=\begin{cases} \sum_{x\in\mathbb{R}}xf\left(\left.x\right|\theta\right), & x\,discrete\\ \int_{-\infty}^{\infty}xf\left(\left.x\right|\theta\right)dx, & x\,continuous \end{cases}[/math]

The method of moments estimator [math]\widehat{\theta}_{MM}[/math] is the solution to equation

[math]\mu\left(\widehat{\theta}_{MM}\right)=\overline{x}=\frac{1}{n}\sum_{i=1}^{n}x_{i}.[/math]

The method of moments estimator can be summarized as equaling the population mean (which depends on some parameter [math]\theta[/math]) to the sample mean [math]\overline{x}[/math], and solving for the unknown parameter [math]\theta[/math], calling it [math]\widehat{\theta}_{MM}[/math].

Up to now, the method could as well be called “method of moment” estimator, since it only uses the first moment of [math]X[/math]. Additional moments will be used depending on the number of parameters we would like to estimate.

Example: Uniform

Let [math]X_{i}\overset{iid}{\sim}U\left(0,\theta\right)[/math], where [math]\theta\gt 0[/math] is unknown. In this case, our moment equation is [math]\mu\left(\theta\right)=E\left(X\right)=\frac{\theta}{2}[/math].

For estimation, we replace [math]\theta[/math] by [math]\widehat{\theta}_{MM}[/math] and equal [math]E\left(X\right)[/math] to [math]\overline{x}[/math], to obtain

[math]\begin{aligned} \mu\left(\widehat{\theta}_{MM}\right) & =\frac{\widehat{\theta}_{MM}}{2}=\overline{x}\\ & \Leftrightarrow\widehat{\theta}_{MM}=2\overline{x}.\end{aligned}[/math]

So, the method of moments estimator for the upper bound of the uniform distribution is simply twice the observed mean.

It turns out this is not a very good estimator of [math]\overline{x}[/math], because we can obtain an estimate for [math]\widehat{\theta}_{MM}[/math] that falls below the maximum value of [math]x_{i}[/math] observed in the sample. For example, suppose [math]\overline{x}[/math] is 0.4, making our estimate [math]\widehat{\theta}_{MM}=0.8[/math]. This may not be reasonable if the highest draw was 1.2, for example: From this, we immediately know that [math]\theta\geq1.2[/math]. However, the method of moments estimator ignores the bounds of the uniform distribution, and so fails to account for the fact that [math]\widehat{\theta}_{MM}=0.8[/math] may fall under the highest value observed in the sample.

Multiple Parameters

When we want to estimate multiple parameters, we simply solve more equations, by including additional moments (typically in ascending order):

Let [math]X_{1}..X_{n}[/math] be a random sample from a distribution with pmf/pdf [math]f\left(\left.\cdot\right|\theta\right)[/math], where [math]\theta\subseteq\Theta\subseteq\mathbb{R}^{k}[/math] is unknown.

For [math]j=1..k[/math], let [math]\mu_{j}:\Theta\rightarrow\mathbb{R}[/math] be given by

[math]\mu_{j}\left(\theta\right)=\begin{cases} \sum_{x\in\mathbb{R}}x^{j}f\left(\left.x\right|\theta\right), & x\,discrete\\ \int_{-\infty}^{\infty}x^{j}f\left(\left.x\right|\theta\right)dx, & x\,continuous \end{cases}[/math]

The method of moments estimator [math]\widehat{\theta}_{MM}[/math] solves the system of equations

[math]\mu_{j}\left(\widehat{\theta}_{MM}\right)=\frac{1}{n}\sum_{i=1}^{n}x_{i}^{j},\,j=1..k[/math]

Consider the following example. Suppose [math]X_{i}\overset{iid}{\sim}N\left(\mu,\sigma^{2}\right)[/math] where [math]\mu\in\mathbb{R}[/math] and [math]\sigma^{2}\gt 0[/math] are unknown.

Then, the moment equations equal:

[math]\begin{aligned} \mu_{1}\left(\mu,\sigma^{2}\right) & =E\left(X\right)=\mu\\ \mu_{2}\left(\mu,\sigma^{2}\right) & =E\left(X^{2}\right)=\sigma^{2}+\mu^{2}\end{aligned}[/math]

Plugging in [math]\widehat{\mu}_{MM}[/math], [math]\widehat{\sigma^{2}}_{MM}[/math] and equaling the population moments to their sample analogues yields the system

[math]\begin{aligned} & \left\{ \begin{array}{c} \mu_{1}\left(\widehat{\mu}_{MM},\widehat{\sigma^{2}}_{MM}\right)=E\left(X\right)=\widehat{\mu}_{MM}=\overline{x}\\ \mu_{2}\left(\widehat{\mu}_{MM},\widehat{\sigma^{2}}_{MM}\right)=E\left(X^{2}\right)=\widehat{\sigma^{2}}_{MM}+\widehat{\mu}_{MM}^{2}=\frac{1}{n}\sum_{i=1}^{n}x_{i}^{2} \end{array}\right.\\ = & \left\{ \begin{array}{c} \widehat{\mu}_{MM}=\overline{x}\\ \widehat{\sigma^{2}}_{MM}+\overline{x}^{2}=\frac{1}{n}\sum_{i=1}^{n}x_{i}^{2} \end{array}\right.=\left\{ \begin{array}{c} -\\ \widehat{\sigma^{2}}_{MM}=\frac{\sum_{i=1}^{n}x_{i}^{2}}{n}-\overline{x}^{2} \end{array}\right.\end{aligned}[/math] and the last expression can be further simplified: [math]\frac{\sum_{i=1}^{n}x_{i}^{2}}{n}-\overline{x}^{2}=\frac{\sum_{i=1}^{n}x_{i}^{2}}{n}-\frac{n}{n}\overline{x}^{2}=\frac{\sum_{i=1}^{n}\left(x_{i}^{2}-\overline{x}^{2}\right)}{n}=\frac{\sum_{i=1}^{n}\left(x_{i}^{2}-\overline{x}^{2}+\left(\overline{x}^{2}-\overline{x}^{2}\right)+\left(2x_{i}\overline{x}-2x_{i}\overline{x}\right)\right)}{n}=\frac{\sum_{i=1}^{n}\left(\left(x_{i}-\overline{x}\right)^{2}-2\overline{x}^{2}+2x_{i}\overline{x}\right)}{n}=\frac{1}{n}\sum_{i=1}^{n}\left(x_{i}-\overline{x}\right)^{2}+\frac{2}{n}\underset{=0}{\underbrace{\left[\sum_{i=1}^{n}\left(x_{i}\overline{x}\right)-n\overline{x}^{2}\right]}}[/math][math]=\frac{1}{n}\sum_{i=1}^{n}\left(x_{i}-\overline{x}\right)^{2}[/math].

In this case, we had to solve a system of equations, and used the first and second moments of [math]X[/math].

While the estimator [math]\widehat{\mu}_{MM}[/math] is intuitive, the estimator [math]\widehat{\sigma^{2}}_{MM}=\frac{1}{n}\sum_{i=1}^{n}\left(x_{i}-\overline{x}\right)^{2}[/math] is a bit surprising, since we know that [math]E\left(\frac{1}{n}\sum_{i=1}^{n}\left(x_{i}-\overline{x}\right)^{2}\right)\neq\sigma^{2}[/math]. Rather, we know that [math]E\left(s^{2}\right)=E\left(\frac{1}{n-1}\sum_{i=1}^{n}\left(x_{i}-\overline{x}\right)^{2}\right)=\sigma^{2}[/math], so it is possible that the method of moments estimator for [math]\sigma^{2}[/math] can be improved. Currently, on average, it produces a biased estimate of [math]\sigma^{2}[/math]. We will return to this point, with a formal analysis.

For cases with complicated estimators, involving non-linear equations for example, one may solve the system of equations with the help of a computer.