Lecture 8. C) Maximum Likelihood

From Significant Statistics
Jump to navigation Jump to search

Maximum Likelihood

Let [math]X_{1}..X_{N}[/math] be a random sample from a distribution with pmf/pdf [math]f\left(\left|\cdot\right.\theta\right)[/math], where [math]\theta\in\Theta[/math] is unknown.

The maximum likelihood estimator (MLE) [math]\widehat{\theta}_{ML}[/math] maximizes [math]L\left(\widehat{\theta}_{ML}\left|x_{1}..x_{n}\right.\right)[/math] where [math]L\left(\cdot\left|x_{1}..x_{n}\right.\right):\Theta\rightarrow\mathbb{R}_{+}[/math] (codomain is [math]\left[0,1\right][/math] in the case of the pmf) is given by

[math]L\left(\theta\left|x_{1}..x_{n}\right.\right)=\Pi_{i=1}^{n}f\left(\left|x_{i}\right.\theta\right),\,\theta\in\Theta.[/math]

Function [math]L\left(\theta\left|x_{1}..x_{n}\right.\right)[/math] is called the likelihood function.

In discrete case, the joint pmf equals the probability of the sample having occurred, given parameter [math]\theta[/math]. So, the intuition for the maximum likelihood estimator is that we look for the value of [math]\theta[/math] that maximizes the probability of the observed sample having occurred.

The maximum likelihood estimator has some incredibly useful properties, which we will discuss later.

log-Likelihood

Sometimes, a convenient object to work with is the log-likelihood function, given by

[math]l\left(\theta\left|x_{1}..x_{n}\right.\right)=\log\,L\left(\theta\left|x_{1}..x_{n}\right.\right)=\sum_{i=1}^{n}\log f\left(\left|x_{i}\right.\theta\right)[/math]

The last identity follows from the fact that the log of the product is equal to the sum of the logs. Notice that because [math]\log\left(\cdot\right)[/math] is a monotone function, it is also maximized by [math]\widehat{\theta}_{ML}[/math].

In order to compute the maximum likelihood function, we simply need to obtain it and maximize it w.r.t. the parameters of interest.

Example: Bernoulli

Suppose [math]X_{i}\overset{iid}{\sim}Ber\left(p\right)[/math], where [math]p\in\left[0,1\right][/math] is unknown.

The marginal pmf equals [math]f\left(\left.x\right|p\right)=p^{x}\left(1-p\right)^{1-x}1\left(x\in\left\{ 0,1\right\} \right)[/math]. We can ignore [math]1\left(x\in\left\{ 0,1\right\} \right)[/math], since by assumption it is satisfied in the sample.

The likelihood function equals [math]L\left(\left.p\right|x_{1}..x_{n}\right)=\Pi_{i=1}^{n}f\left(\left.x_{i}\right|p\right)=\Pi_{i=1}^{n}\left(p^{x}\left(1-p\right)^{1-x}\right)=p^{\sum_{i=1}^{n}x_{i}}\left(1-p\right)^{n-\sum_{i=1}^{n}x_{i}},p\in\left[0,1\right][/math]

The log-likelihood function equals [math]l\left(\left.p\right|x_{1}..x_{n}\right)=\sum_{i=1}^{n}x_{i}\log\left(p\right)+\left(n-\sum_{i=1}^{n}x_{i}\right)\log\left(1-p\right),\,p\in\left(0,1\right)[/math]

(Because [math]\log\left(0\right)=-\infty[/math], we will inspect [math]p=0[/math] and [math]p=1[/math] separately.)

We look for an interior solution:

[math]\begin{aligned} foc\left(p\right): & =\frac{\partial}{\partial p}l\left(\left.p\right|x_{1}..x_{n}\right)=0\\ \Leftrightarrow & \frac{\sum_{i=1}^{n}x_{i}}{p}-\frac{n-\sum_{i=1}^{n}x_{i}}{1-p}=0\\ \Leftrightarrow & \widehat{p}_{ML}=\frac{\sum_{i=1}^{n}x_{i}}{n}\end{aligned}[/math]

Verifying the second-order condition reveals that our estimator is indeed a maximum.

Finally, consider the possible exceptions. When [math]\frac{\sum_{i=1}^{n}x_{i}}{n}=0[/math], [math]\widehat{p}_{ML}=0[/math], but this value is outside the allowed bounds of the log-likelihood.

Similarly, when [math]\frac{\sum_{i=1}^{n}x_{i}}{n}=1[/math], our estimator’s value falls outside the defined region of our log-likelihood. We can use the likelihood function to determine what the estimator is in these two cases:

  • [math]\sum_{i=1}^{n}x_{i}=0[/math], [math]\max_{p}\,L\left(\left.p\right|x_{1}..x_{n}\right)=\max_{p}\,\left(1-p\right)^{n}\Rightarrow\widehat{p}_{ML}=0[/math]
  • [math]\sum_{i=1}^{n}x_{i}=n[/math], [math]\max_{p}\,L\left(\left.p\right|x_{1}..x_{n}\right)=\max_{p}\,p^{n}\Rightarrow\widehat{p}_{ML}=1[/math]

Interestingly, our original formula [math]\widehat{p}_{ML}=\frac{\sum_{i=1}^{n}x_{i}}{n}[/math] also agrees with these two cases, and so we have found that for any admissible sample, [math]\widehat{p}_{ML}=\frac{\sum_{i=1}^{n}x_{i}}{n}[/math].

Notice also that the method of moments estimator coincides with the maximum likelihood estimator (this is not guaranteed by any means).