# Bootstrapping

The origin of the term “bootstrapping” may relate to someone pulling themselves up by their own boot straps/laces. In a sense, it means making do with little or nothing. Here is the idea: Suppose you would like to conduct an hypothesis test, but were unaware of the test distribution. Even if it converges to a normal, who knows what its asymptotic variance may be (e.g., what is the variance of the test statistic when $n$ tends to infinity)?

Consider the following approach: If one has enough data, then the distribution in the sample is representative of the distribution of the population. So, one may pretend that the sample itself is the population, and draw from that sample as if one was drawing from the population.

The bootstrap technique can be applied to MLE in the following way. Given a sample of size $N$, $\left\{ y_{i},x_{i}\right\} _{i=1}^{N}$:

• Estimate $\widehat{\theta}_{ML}=\text{argmax}_{\theta}\,f\left(\left.y\right|X,\theta\right)$ as usual.
• Calculate the test statistic of interest, $T$. (We could use the LRT, for example; notice that we do not know its distribution, nor the appropriate critical value).
• Then, resample (with replacement) $\left\{ y_{i},x_{i}\right\}$ to get a new (bootstrap) sample $\left\{ y_{j}^{b},x_{j}^{b}\right\} _{j=1}^{N}$. Do this $B$ times, such that each sample can be indexed by $b\in\left\{ 1,..,B\right\} .$
• For each bootstrap sample, estimate $\widehat{\theta}_{b}=\text{argmax}_{\theta}\,f\left(\left.y\right|X,\theta\right).$
• Calculate the test statistic of interest for each estimation, $T^{b}$.

While we do not know the distribution of the test statistic, we can approximate it, since we drew it many times from our own sample. Moreover, we can now build confidence intervals for the test statistic (we just need to pick $\underline{t},\overline{t}$ s.t. 95% of the test statistics $T^{b}$ fall in the interval), and in the case of the LRT, we can reject the null hypothesis if $T$ is higher than at least 95% of the $T^{b}$ tests we drew. Notice that such a test commits a type 1 error with 5% probability, as is often conducted.