Random Sample

Let $X=\left(X_{1}..X_{n}\right)$ be an n-dimensional random vector. The random variables $X_{1}..X_{n}$ constitute a random sample if they are (mutually) independent and have identical (marginal) distributions.

We usually refer to such variables as being i.i.d.: Independent and Identically Distributed. To reiterate, these variables share the same distribution, and are not correlated.

It follows that if $X$ is a random sample from distribution $F\left(\cdot\right)$, then

$F_{X_{1}..X_{n}}\left(x_{1}..x_{n}\right)\underset{(independence)}{\underbrace{=}}\Pi_{i=1}^{n}F_{X_{i}}\left(x_{i}\right)\underset{(F_{X_{i}}=F,\,\forall i)}{\underbrace{=}}\Pi_{i=1}^{n}F\left(x_{i}\right)$.

Also, note that the multiplicative result also applies to the pmd and pdf.

Statistics

Let $X_{1}..X_{n}$ be a random sample and let $T:\mathbb{R}^{n}\rightarrow\mathbb{R}^{k}$ be a function (for some $k\gt 1$).

The random variable $Y=T\left(X_{1}..X_{n}\right)$ is called a statistic, and its distribution is called the sampling distribution of $Y$.

Some Examples

• The sample mean is $\overline{X}=\frac{1}{n}\sum_{i=1}^{n}X_{i}$.
• The sample variance is $s^{2}=\frac{1}{n-1}\sum_{i=1}^{n}\left(X_{i}-\overline{X}\right)^{2}$.
• The sample standard deviation is $s=\sqrt{s^{2}}$.

Notice that each of the statistics above is a random variable. Each random sample of $X$s will yield a slightly different sample mean, sample variance, etc.

At this point you may be wondering about the $\frac{1}{n-1}$ factor in the formula for the sample variance. We will explain that shortly.

The statistics above are random variables in their own right. They too have moments. Here are a few:

Expected Sample Mean

• $E\left(\overline{X}\right)=E\left(\frac{1}{n}\sum_{i=1}^{n}X_{i}\right)=\frac{1}{n}\sum_{i=1}^{n}\underset{=\mu}{\underbrace{E\left(X_{i}\right)}}=\frac{n\mu}{n}=\mu.$

Variance of the Sample Mean

• $Var\left(\overline{X}\right)=Var\left(\frac{1}{n}\sum_{i=1}^{n}X_{i}\right)=\frac{1}{n^{2}}Var\left(\sum_{i=1}^{n}X_{i}\right)=\frac{1}{n^{2}}\sum_{i=1}^{n}Var\left(X_{i}\right)=\frac{n\sigma^{2}}{n^{2}}=\frac{\sigma^{2}}{n}$.

The variance result is interesting and intuitive: As we increase the sample size, the variance of the mean decreases. For example, suppose you’d take 100 draws of $X_{i}$ many times, and each time, calculated the mean. (For example, in Excel, each column would contain 100 draws of $X_{i}$, and the final row calculates the means across all columns). The variance of the means decreases with the number of draws (in our case, 100). If we increased the number of draws to 1,000,000, then the means of each column would probably be very similar, and so the variance of those means would be further reduced.

The result on $Var\left(\overline{X}\right)$ tells us the specific rate at which the variance of the mean decreases with $n$.

Consider also the following well-known result, which we provide a lot of detail for:

Expectation of $s^{2}$

• $E\left(s^{2}\right)=E\left[\frac{1}{n-1}\sum_{i=1}^{n}\left(X_{i}-\overline{X}\right)^{2}\right]=\frac{1}{n-1}E\left[\sum_{i=1}^{n}\left(X_{i}^{2}+\overline{X}^{2}-2X_{i}\overline{X}\right)\right]=\frac{1}{n-1}E\left[n\overline{X}^{2}+\sum_{i=1}^{n}\left(X_{i}^{2}\right)-2\overline{X}\sum_{i=1}^{n}\left(\frac{n}{n}X_{i}\right)\right]=\frac{1}{n-1}E\left[n\overline{X}^{2}+\sum_{i=1}^{n}\left(X_{i}^{2}\right)-2n\overline{X}\sum_{i=1}^{n}\left(\frac{X_{i}}{n}\right)\right]$$=\frac{1}{n-1}E\left[n\overline{X}^{2}+\sum_{i=1}^{n}\left(X_{i}^{2}\right)-2n\overline{X}^{2}\right]=\frac{1}{n-1}E\left[\sum_{i=1}^{n}\left(X_{i}^{2}\right)-n\overline{X}^{2}\right]=\frac{1}{n-1}\left(nE\left(X_{i}^{2}\right)-nE\left(\overline{X}^{2}\right)\right)=\frac{1}{n-1}\left(n\left(\mu^{2}+\sigma^{2}\right)-n\left(\mu^{2}+\frac{\sigma^{2}}{n}\right)\right)=\frac{n\sigma^{2}-\sigma^{2}}{n-1}=\sigma^{2}$

We have used the fact that $E\left(X_{i}^{2}\right)=Var\left(X_{i}\right)+E\left(X_{i}\right)^{2}$ and $E\left(\overline{X}_{i}^{2}\right)=Var\left(\overline{X}_{i}\right)+E\left(\overline{X}_{i}\right)^{2}$.

It may be surprising that $E\left(s^{2}\right)=\sigma^{2}$, given the difference in their denominators. The reason the denominator $\frac{1}{n-1}$ in $s^{2}$ yields an expectation of $\sigma^{2}$ is that the draws of $X_{i}$ will be closer to their average ($\overline{X}$) than to the sample mean, $E\left(X\right)$. As a result, we are required to use a lower denominator than if we knew the true population mean.

Order Statistics

Let $X_{1}..X_{n}$ be a random sample. The order statistics are the sample values placed in ascending order, i.e.,

$X_{\left(1\right)}=\min_{i\leq n}X_{i}\leq X_{\left(2\right)}\leq...\leq X_{\left(n\right)}=\max_{i\leq n}X_{i}$

This is a maybe unexpected, but often useful statistic. We can ask what the is distribution of the maximum of a random sample.

For example, if we drew many sets of 30 draws each of $X\sim N\left(0,1\right)$, what would be the distribution of the maximum (across the samples)?

Distribution of the Maximum

The distribution of the maximum of a random sample with cdf $F\left(\cdot\right)$ equals

$F_{X_{\left(n\right)}}\left(x\right)=P\left(X_{\left(n\right)}\leq x\right)=P\left(X_{1}\leq x,X_{2}\leq x,...,X_{n}\leq x\right)=P\left(X_{1}\leq x\right)P\left(X_{2}\leq x\right)...P\left(X_{n}\leq x\right)=F\left(x\right)^{n}$.

The distribution for the lowest order statistic can also be calculated via a similar method.

Distribution of Order Statistics

In general, the distribution of the k-th order statistic is given by

$F_{X_{\left(r\right)}}\left(x\right)=P\left(X_{\left(r\right)}\leq x\right)=\sum_{j=r}^{n}\left(\begin{array}{c} n\\ j \end{array}\right)F\left(x\right)^{j}\left(1-F\left(x\right)\right)^{n-j}$

where the binomial structure is apparent.

For each value of $j$, starting at $r$, we sum the probability of observing $j$ values below $x$ and $n-j$ values above.

For example, consider the case where $n=30$. Then, $P\left(X_{\left(r\right)}\leq x\right)=P\left(X_{\left(r\right)}\leq x\wedge X_{\left(r+1\right)}\gt x\right)+\left(X_{\left(r+1\right)}\leq x\wedge X_{\left(r+2\right)}\gt x\right)+...$: The sum over the binomials is simply the sum of the probabilities of the cases that satisfy to $X_{\left(r\right)}\leq x$.

Statistical Inference

This point marks the end of the introduction of the probability tools needed. Our goal now shifts, from situations where distributions are known and outcomes are unknown, to situations where we observed the outcomes but not the distributions (up to some parameters). We will keep denoting random variables by capital letters, and will denote outcomes by lowercase letters. Some examples:

• We may observe $x_{1}...x_{n}$ where $X_{i}\sim Ber\left(p\right)$ where $p\in\left(0,1\right)$ is unknown.
• We may observe $x_{1}...x_{n}$ where $X_{i}\sim U\left(0,\theta\right)$ where $\theta\gt 0$ is unknown.
• We may observe $x_{1}...x_{n}$ where $X_{i}\sim N\left(\mu,\sigma^{2}\right)$ where $\mu\in\mathbb{R}$ and/or $\sigma^{2}\gt 0$ are/is unknown.

We will consider three types of statistical inference:

• Point Estimation
• In this case, we want to single out one distribution (specifically, the parameters of the distribution).
• Hypothesis Testing
• In this case, we want to evaluate a specific theory (for example, that $\mu=0$).
• Interval Estimation
• In this case, we want to isolate which values of $\theta$ are plausible.