Difference between revisions of "Lecture 15. A) Asymptotic Properties of ML Estimators"

From Significant Statistics
Jump to navigation Jump to search
(Asymptotic Properties of ML Estimators)
(Asymptotic Properties of ML Estimators)
 
Line 16: Line 16:
 
<math display="block">\sqrt{n}\left(\widehat{\theta}_{ML}-\theta_{0}\right)\overset{d}{\rightarrow}N\left(0,I\left(\theta_{0}\right)^{-1}\right)</math>
 
<math display="block">\sqrt{n}\left(\widehat{\theta}_{ML}-\theta_{0}\right)\overset{d}{\rightarrow}N\left(0,I\left(\theta_{0}\right)^{-1}\right)</math>
  
where <math display="inline">I\left(\theta_{0}\right)</math> is the Fisher information at <math display="inline">\theta=\theta_{0}</math>.
+
where <math display="inline">I\left(\theta_{0}\right)</math> is the Fisher information.
  
 
== Proof ==
 
== Proof ==

Latest revision as of 19:22, 3 December 2019

.

Asymptotic Properties of ML Estimators

Up to now, we have derived the distributions of several test statistics. This has been a relatively tedious process, however.

For example, the hypothesis test for the [math]\mu[/math] parameter in the normal distribution depends on whether [math]\sigma^{2}[/math] is known or not. In general, the distribution of more complicated tests may be extremely challenging to find.

It turns out that, as long as [math]n[/math] is large, we can use a nice property of the maximum likelihood estimator.

Let [math]X_{1}..X_{n}[/math] be a random sample with pdf [math]f\left(\left.x\right|\theta_{0}\right)[/math]. Under a few regularity conditions,

[math]\sqrt{n}\left(\widehat{\theta}_{ML}-\theta_{0}\right)\overset{d}{\rightarrow}N\left(0,I\left(\theta_{0}\right)^{-1}\right)[/math]

where [math]I\left(\theta_{0}\right)[/math] is the Fisher information.

Proof

First,

  • Denote [math]l\left(\theta\right)=l\left(\left.\theta\right|x_{1}..x_{n}\right)=\sum_{i=1}^{n}\log\left(f\left(\left.x_{i}\right|\theta\right)\right)[/math].
  • Let [math]I\left(\theta\right)=E_{\theta}\left[\left(l_{1}^{'}\left(\theta\right)\right)^{2}\right]=-E_{\theta}\left[l_{1}^{''}\left(\theta\right)\right]=Var_{\theta}\left(l_{1}^{'}\left(\theta\right)\right)[/math] where [math]l_{1}\left(\theta\right)=l_{1}\left(\left.\theta\right|x_{1}\right)[/math] is the log-likelihood for one observation.

We expand the first derivative of the log-likelihood function around [math]\theta_{0}[/math]:

[math]l^{'}\left(\theta\right)=l^{'}\left(\theta_{0}\right)+\left(\theta-\theta_{0}\right)l^{''}\left(\theta_{0}\right)+\frac{\left(\theta-\theta_{0}\right)^{2}}{2}l^{'''}\left(\theta^{*}\right),\,\theta^{*}\in\left(\theta,\theta_{0}\right)[/math]

Now, we evaluate the expansion at [math]\theta=\widehat{\theta}_{ML}[/math]:

[math]\begin{aligned} & \underset{=0}{\underbrace{l^{'}\left(\widehat{\theta}_{ML}\right)}}=l^{'}\left(\theta_{0}\right)+\left(\widehat{\theta}_{ML}-\theta_{0}\right)l^{''}\left(\theta_{0}\right)+\frac{\left(\widehat{\theta}_{ML}-\theta_{0}\right)^{2}}{2}l^{'''}\left(\theta^{*}\right),\,\theta^{*}\in\left(\theta,\theta_{0}\right)\\ \Leftrightarrow & \widehat{\theta}_{ML}-\theta_{0}=\frac{-l^{'}\left(\theta_{0}\right)}{l^{''}\left(\theta_{0}\right)+\frac{1}{2}\left(\widehat{\theta}_{ML}-\theta_{0}\right)l^{'''}\left(\theta^{*}\right)}\\ \Leftrightarrow & \sqrt{n}\left(\widehat{\theta}_{ML}-\theta_{0}\right)=\frac{\frac{1}{\sqrt{n}}l^{'}\left(\theta_{0}\right)}{-\frac{1}{n}l^{''}\left(\theta_{0}\right)-\frac{1}{2n}\left(\widehat{\theta}_{ML}-\theta_{0}\right)l^{'''}\left(\theta^{*}\right)}\end{aligned}[/math]

Under the assumption that [math]l^{'''}\left(\theta^{*}\right)[/math] is “well-behaved” around [math]\theta_{0}[/math] such that we can ignore it, notice that

[math]\frac{1}{\sqrt{n}}l^{'}\left(\theta_{0}\right)=\sqrt{n}\left(\frac{1}{n}\sum_{i=1}^{n}\frac{\partial}{\partial\theta}\log\,f\left(\left.x_{i}\right|\theta_{0}\right)\right)=\sqrt{n}\overline{W}\overset{d}{\rightarrow}N\left(0,I\left(\theta_{0}\right)\right)[/math] where [math]W_{i}=\frac{\partial}{\partial\theta}\log\,f\left(\left.x_{i}\right|\theta_{0}\right)[/math].

To prove this first result, it suffices that [math]E\left(W_{i}\right)=0[/math] and [math]Var\left(W_{i}\right)=I\left(\theta_{0}\right)[/math].

Notice that

[math]\begin{aligned} E\left(W_{i}\right) & =E\left(\left.\frac{\partial}{\partial\theta}\log\,f\left(\left.x\right|\theta\right)\right|_{\theta=\theta_{0}}\right)\\ & =E\left(\frac{\left.\frac{\partial}{\partial\theta}f\left(\left.x\right|\theta\right)\right|_{\theta=\theta_{0}}}{f\left(\left.x\right|\theta_{0}\right)}\right)\\ & =\int_{-\infty}^{\infty}\frac{\left.\frac{\partial}{\partial\theta}f\left(\left.x\right|\theta\right)\right|_{\theta=\theta_{0}}}{f\left(\left.x\right|\theta_{0}\right)}f\left(\left.x\right|\theta_{0}\right)dx\\ & =\int_{-\infty}^{\infty}\left.\frac{\partial}{\partial\theta}f\left(\left.x\right|\theta\right)\right|_{\theta=\theta_{0}}dx\\ & =0\end{aligned}[/math]

The last identity follows from the fact that [math]\int_{-\infty}^{\infty}f\left(\left.x\right|\theta\right)dx=1[/math], i.e., the value of the integral w.r.t. [math]x[/math] does not change with [math]\theta[/math], such that

[math]\begin{aligned} & \frac{d}{d\theta}\int_{-\infty}^{\infty}f\left(\left.x\right|\theta\right)dx=\frac{d}{d\theta}1\\ \Leftrightarrow & \int_{-\infty}^{\infty}\frac{\partial}{\partial\theta}f\left(\left.x\right|\theta\right)dx=0\end{aligned}[/math]

And the variance expression is the same used to define the Fisher information (Lecture 10. c) Cramer-Rao Lower Bound).

As for the denominator, notice that by the law of large numbers, [math]-\frac{1}{n}l^{''}\left(\theta_{0}\right)=-\frac{1}{n}\sum_{i=1}^{n}\frac{\partial^{2}}{\partial\theta^{2}}f\left(\left.x\right|\theta\right)\overset{p}{\rightarrow}I\left(\theta_{0}\right)[/math]

By Slutsky’s theorem, we obtain that the ratio

[math]\frac{\frac{1}{\sqrt{n}}l^{'}\left(\theta_{0}\right)\overset{d}{\rightarrow}N\left(0,I\left(\theta_{0}\right)\right)}{-\frac{1}{n}l^{''}\left(\theta_{0}\right)\overset{p}{\rightarrow}I\left(\theta_{0}\right)}[/math]

converges such that

[math]\sqrt{n}\left(\widehat{\theta}_{ML}-\theta_{0}\right)\sim N\left(0,I\left(\theta_{0}\right)^{-1}\right).[/math]