Lecture 15. A) Asymptotic Properties of ML Estimators

From Significant Statistics
Jump to navigation Jump to search
.

Asymptotic Properties of ML Estimators

Up to now, we have derived the distributions of several test statistics. This has been a relatively tedious process, however.

For example, the hypothesis test for the [math]\mu[/math] parameter in the normal distribution depends on whether [math]\sigma^{2}[/math] is known or not. In general, the distribution of more complicated tests may be extremely challenging to find.

It turns out that, as long as [math]n[/math] is large, we can use a nice property of the maximum likelihood estimator.

Let [math]X_{1}..X_{n}[/math] be a random sample with pdf [math]f\left(\left.x\right|\theta_{0}\right)[/math]. Under a few regularity conditions,

[math]\sqrt{n}\left(\widehat{\theta}_{ML}-\theta_{0}\right)\overset{d}{\rightarrow}N\left(0,I\left(\theta_{0}\right)^{-1}\right)[/math]

where [math]I\left(\theta_{0}\right)[/math] is the Fisher information at [math]\theta=\theta_{0}[/math].

Proof

First,

  • Denote [math]l\left(\theta\right)=l\left(\left.\theta\right|x_{1}..x_{n}\right)=\sum_{i=1}^{n}\log\left(f\left(\left.x_{i}\right|\theta\right)\right)[/math].
  • Let [math]I\left(\theta\right)=E_{\theta}\left[\left(l_{1}^{'}\left(\theta\right)\right)^{2}\right]=-E_{\theta}\left[l_{1}^{''}\left(\theta\right)\right]=Var_{\theta}\left(l_{1}^{'}\left(\theta\right)\right)[/math] where [math]l_{1}\left(\theta\right)=l_{1}\left(\left.\theta\right|x_{1}\right)[/math] is the log-likelihood for one observation.

We expand the first derivative of the log-likelihood function around [math]\theta_{0}[/math]:

[math]l^{'}\left(\theta\right)=l^{'}\left(\theta_{0}\right)+\left(\theta-\theta_{0}\right)l^{''}\left(\theta_{0}\right)+\frac{\left(\theta-\theta_{0}\right)^{2}}{2}l^{'''}\left(\theta^{*}\right),\,\theta^{*}\in\left(\theta,\theta_{0}\right)[/math]

Now, we evaluate the expansion at [math]\theta=\widehat{\theta}_{ML}[/math]:

[math]\begin{aligned} & \underset{=0}{\underbrace{l^{'}\left(\widehat{\theta}_{ML}\right)}}=l^{'}\left(\theta_{0}\right)+\left(\widehat{\theta}_{ML}-\theta_{0}\right)l^{''}\left(\theta_{0}\right)+\frac{\left(\widehat{\theta}_{ML}-\theta_{0}\right)^{2}}{2}l^{'''}\left(\theta^{*}\right),\,\theta^{*}\in\left(\theta,\theta_{0}\right)\\ \Leftrightarrow & \widehat{\theta}_{ML}-\theta_{0}=\frac{-l^{'}\left(\theta_{0}\right)}{l^{''}\left(\theta_{0}\right)+\frac{1}{2}\left(\widehat{\theta}_{ML}-\theta_{0}\right)l^{'''}\left(\theta^{*}\right)}\\ \Leftrightarrow & \sqrt{n}\left(\widehat{\theta}_{ML}-\theta_{0}\right)=\frac{\frac{1}{\sqrt{n}}l^{'}\left(\theta_{0}\right)}{-\frac{1}{n}l^{''}\left(\theta_{0}\right)-\frac{1}{2n}\left(\widehat{\theta}_{ML}-\theta_{0}\right)l^{'''}\left(\theta^{*}\right)}\end{aligned}[/math]

Under the assumption that [math]l^{'''}\left(\theta^{*}\right)[/math] is “well-behaved” around [math]\theta_{0}[/math] such that we can ignore it, notice that

[math]\frac{1}{\sqrt{n}}l^{'}\left(\theta_{0}\right)=\sqrt{n}\left(\frac{1}{n}\sum_{i=1}^{n}\frac{\partial}{\partial\theta}\log\,f\left(\left.x_{i}\right|\theta_{0}\right)\right)=\sqrt{n}\overline{W}\overset{d}{\rightarrow}N\left(0,I\left(\theta_{0}\right)\right)[/math] where [math]W_{i}=\frac{\partial}{\partial\theta}\log\,f\left(\left.x_{i}\right|\theta_{0}\right)[/math].

To prove this first result, it suffices that [math]E\left(W_{i}\right)=0[/math] and [math]Var\left(W_{i}\right)=I\left(\theta_{0}\right)[/math].

Notice that

[math]\begin{aligned} E\left(W_{i}\right) & =E\left(\left.\frac{\partial}{\partial\theta}\log\,f\left(\left.x\right|\theta\right)\right|_{\theta=\theta_{0}}\right)\\ & =E\left(\frac{\left.\frac{\partial}{\partial\theta}f\left(\left.x\right|\theta\right)\right|_{\theta=\theta_{0}}}{f\left(\left.x\right|\theta_{0}\right)}\right)\\ & =\int_{-\infty}^{\infty}\frac{\left.\frac{\partial}{\partial\theta}f\left(\left.x\right|\theta\right)\right|_{\theta=\theta_{0}}}{f\left(\left.x\right|\theta_{0}\right)}f\left(\left.x\right|\theta_{0}\right)dx\\ & =\int_{-\infty}^{\infty}\left.\frac{\partial}{\partial\theta}f\left(\left.x\right|\theta\right)\right|_{\theta=\theta_{0}}dx\\ & =0\end{aligned}[/math]

The last identity follows from the fact that [math]\int_{-\infty}^{\infty}f\left(\left.x\right|\theta\right)dx=1[/math], i.e., the value of the integral w.r.t. [math]x[/math] does not change with [math]\theta[/math], such that

[math]\begin{aligned} & \frac{d}{d\theta}\int_{-\infty}^{\infty}f\left(\left.x\right|\theta\right)dx=\frac{d}{d\theta}1\\ \Leftrightarrow & \int_{-\infty}^{\infty}\frac{\partial}{\partial\theta}f\left(\left.x\right|\theta\right)dx=0\end{aligned}[/math]

And the variance expression is the same used to define the Fisher information (Lecture 10. c) Cramer-Rao Lower Bound).

As for the denominator, notice that by the law of large numbers, [math]-\frac{1}{n}l^{''}\left(\theta_{0}\right)=-\frac{1}{n}\sum_{i=1}^{n}\frac{\partial^{2}}{\partial\theta^{2}}f\left(\left.x\right|\theta\right)\overset{p}{\rightarrow}I\left(\theta_{0}\right)[/math]

By Slutsky’s theorem, we obtain that the ratio

[math]\frac{\frac{1}{\sqrt{n}}l^{'}\left(\theta_{0}\right)\overset{d}{\rightarrow}N\left(0,I\left(\theta_{0}\right)\right)}{-\frac{1}{n}l^{''}\left(\theta_{0}\right)\overset{p}{\rightarrow}I\left(\theta_{0}\right)}[/math]

converges such that

[math]\sqrt{n}\left(\widehat{\theta}_{ML}-\theta_{0}\right)\sim N\left(0,I\left(\theta_{0}\right)^{-1}\right).[/math]