.

# Asymptotic Properties of ML Estimators

Up to now, we have derived the distributions of several test statistics. This has been a relatively tedious process, however.

For example, the hypothesis test for the $\mu$ parameter in the normal distribution depends on whether $\sigma^{2}$ is known or not. In general, the distribution of more complicated tests may be extremely challenging to find.

It turns out that, as long as $n$ is large, we can use a nice property of the maximum likelihood estimator.

Let $X_{1}..X_{n}$ be a random sample with pdf $f\left(\left.x\right|\theta_{0}\right)$. Under a few regularity conditions,

$\sqrt{n}\left(\widehat{\theta}_{ML}-\theta_{0}\right)\overset{d}{\rightarrow}N\left(0,I\left(\theta_{0}\right)^{-1}\right)$

where $I\left(\theta_{0}\right)$ is the Fisher information.

## Proof

First,

• Denote $l\left(\theta\right)=l\left(\left.\theta\right|x_{1}..x_{n}\right)=\sum_{i=1}^{n}\log\left(f\left(\left.x_{i}\right|\theta\right)\right)$.
• Let $I\left(\theta\right)=E_{\theta}\left[\left(l_{1}^{'}\left(\theta\right)\right)^{2}\right]=-E_{\theta}\left[l_{1}^{''}\left(\theta\right)\right]=Var_{\theta}\left(l_{1}^{'}\left(\theta\right)\right)$ where $l_{1}\left(\theta\right)=l_{1}\left(\left.\theta\right|x_{1}\right)$ is the log-likelihood for one observation.

We expand the first derivative of the log-likelihood function around $\theta_{0}$:

$l^{'}\left(\theta\right)=l^{'}\left(\theta_{0}\right)+\left(\theta-\theta_{0}\right)l^{''}\left(\theta_{0}\right)+\frac{\left(\theta-\theta_{0}\right)^{2}}{2}l^{'''}\left(\theta^{*}\right),\,\theta^{*}\in\left(\theta,\theta_{0}\right)$

Now, we evaluate the expansion at $\theta=\widehat{\theta}_{ML}$:

\begin{aligned} & \underset{=0}{\underbrace{l^{'}\left(\widehat{\theta}_{ML}\right)}}=l^{'}\left(\theta_{0}\right)+\left(\widehat{\theta}_{ML}-\theta_{0}\right)l^{''}\left(\theta_{0}\right)+\frac{\left(\widehat{\theta}_{ML}-\theta_{0}\right)^{2}}{2}l^{'''}\left(\theta^{*}\right),\,\theta^{*}\in\left(\theta,\theta_{0}\right)\\ \Leftrightarrow & \widehat{\theta}_{ML}-\theta_{0}=\frac{-l^{'}\left(\theta_{0}\right)}{l^{''}\left(\theta_{0}\right)+\frac{1}{2}\left(\widehat{\theta}_{ML}-\theta_{0}\right)l^{'''}\left(\theta^{*}\right)}\\ \Leftrightarrow & \sqrt{n}\left(\widehat{\theta}_{ML}-\theta_{0}\right)=\frac{\frac{1}{\sqrt{n}}l^{'}\left(\theta_{0}\right)}{-\frac{1}{n}l^{''}\left(\theta_{0}\right)-\frac{1}{2n}\left(\widehat{\theta}_{ML}-\theta_{0}\right)l^{'''}\left(\theta^{*}\right)}\end{aligned}

Under the assumption that $l^{'''}\left(\theta^{*}\right)$ is “well-behaved” around $\theta_{0}$ such that we can ignore it, notice that

$\frac{1}{\sqrt{n}}l^{'}\left(\theta_{0}\right)=\sqrt{n}\left(\frac{1}{n}\sum_{i=1}^{n}\frac{\partial}{\partial\theta}\log\,f\left(\left.x_{i}\right|\theta_{0}\right)\right)=\sqrt{n}\overline{W}\overset{d}{\rightarrow}N\left(0,I\left(\theta_{0}\right)\right)$ where $W_{i}=\frac{\partial}{\partial\theta}\log\,f\left(\left.x_{i}\right|\theta_{0}\right)$.

To prove this first result, it suffices that $E\left(W_{i}\right)=0$ and $Var\left(W_{i}\right)=I\left(\theta_{0}\right)$.

Notice that

\begin{aligned} E\left(W_{i}\right) & =E\left(\left.\frac{\partial}{\partial\theta}\log\,f\left(\left.x\right|\theta\right)\right|_{\theta=\theta_{0}}\right)\\ & =E\left(\frac{\left.\frac{\partial}{\partial\theta}f\left(\left.x\right|\theta\right)\right|_{\theta=\theta_{0}}}{f\left(\left.x\right|\theta_{0}\right)}\right)\\ & =\int_{-\infty}^{\infty}\frac{\left.\frac{\partial}{\partial\theta}f\left(\left.x\right|\theta\right)\right|_{\theta=\theta_{0}}}{f\left(\left.x\right|\theta_{0}\right)}f\left(\left.x\right|\theta_{0}\right)dx\\ & =\int_{-\infty}^{\infty}\left.\frac{\partial}{\partial\theta}f\left(\left.x\right|\theta\right)\right|_{\theta=\theta_{0}}dx\\ & =0\end{aligned}

The last identity follows from the fact that $\int_{-\infty}^{\infty}f\left(\left.x\right|\theta\right)dx=1$, i.e., the value of the integral w.r.t. $x$ does not change with $\theta$, such that

\begin{aligned} & \frac{d}{d\theta}\int_{-\infty}^{\infty}f\left(\left.x\right|\theta\right)dx=\frac{d}{d\theta}1\\ \Leftrightarrow & \int_{-\infty}^{\infty}\frac{\partial}{\partial\theta}f\left(\left.x\right|\theta\right)dx=0\end{aligned}

And the variance expression is the same used to define the Fisher information (Lecture 10. c) Cramer-Rao Lower Bound).

As for the denominator, notice that by the law of large numbers, $-\frac{1}{n}l^{''}\left(\theta_{0}\right)=-\frac{1}{n}\sum_{i=1}^{n}\frac{\partial^{2}}{\partial\theta^{2}}f\left(\left.x\right|\theta\right)\overset{p}{\rightarrow}I\left(\theta_{0}\right)$

By Slutsky’s theorem, we obtain that the ratio

$\frac{\frac{1}{\sqrt{n}}l^{'}\left(\theta_{0}\right)\overset{d}{\rightarrow}N\left(0,I\left(\theta_{0}\right)\right)}{-\frac{1}{n}l^{''}\left(\theta_{0}\right)\overset{p}{\rightarrow}I\left(\theta_{0}\right)}$

converges such that

$\sqrt{n}\left(\widehat{\theta}_{ML}-\theta_{0}\right)\sim N\left(0,I\left(\theta_{0}\right)^{-1}\right).$