Let $X_{1},X_{2},\ldots$ be i.i.d. random variables with expectation $\mu$ and variance ${\sigma}^{2}$. We saw that $\bar{X}_{n}$ has mean $\mu$ and standard deviation ${\sigma}/\sqrt{n}$.

This roughly means that $\bar{X}_{n}$ is close to $\mu$, within a few multiples of ${\sigma}/\sqrt{n}$ (as shown by Chebyshev's inequality). Now we look at $\bar{X}_{n}$ with a finer microscope. In other words, we ask for the probability that $\bar{X}_{n}$ is within the tiny interval $[\mu+\frac{a}{\sqrt{n} },\mu+\frac{b}{\sqrt{n} }]$ for any $a < b$. The answer turns out to be surprising and remarkable!

Central limit theorem : Let $X_{1},X_{2},\ldots$ be i.i.d. random variables with expectation $\mu$ and variance ${\sigma}^{2}$. We assume that $0 < {\sigma}^{2} < \infty$. Then, for any $a < b$, we have $$ \mathbf{P}\left\{ \mu+a\frac{{\sigma}}{\sqrt{n} }\le \bar{X}_{n}\le \mu+b\frac{{\sigma}}{\sqrt{n} }\right\} \rightarrow \Phi(b)-\Phi(a) = \frac{1}{\sqrt{2\pi} }\int\limits_{a}^{b}e^{-t^{2}/2}dt. $$

What is remarkable about this? The end result does not depend on the distribution of $X_{i}$s at all! Only the mean and variance of the distribution were used! As this is one of the most important theorems in all of probability theory, we restate it in several forms, all equivalent to the above.

Restatements of central limit theorem : Let $X_{k}$ be as above. Let $S_{n}=X_{1}+\ldots +X_{n}$. Let $Z$ be a $N(0,1)$ random variable. Then of course $\mathbf{P}\{a < Z < b\}=\Phi(b)-\Phi(a)$.

  1. $\mathbf{P}\{a < \frac{\sqrt{n} }{{\sigma}}(\bar{X}_{n}-\mu)\le b\} \rightarrow \Phi(b)-\Phi(a) = \mathbf{P}\{a < Z < b\}$. Put another way, this says that for large $n$, the random variable $\frac{\sqrt{n}(\bar{X}_{n}-\mu)}{{\sigma}}$ has $N(0,1)$ distribution, approximately. Equivalently, $\sqrt{n}(\bar{X}_{n}-\mu)$ has $N(0,{\sigma}^{2})$ distribution, approximately.

  2. Yet another way to say the same is that $S_{n}$ has approximately normal distribution with mean $n\mu$ and variance $n{\sigma}^{2}$. That is, $$ \mathbf{P}\left\{a\le \frac{S_{n}-n\mu}{{\sigma}\sqrt{n} } \le b\right\} \rightarrow \mathbf{P}\{a < Z < b\}. $$

The central limit theorem so deep and surprising and useful. The following example gives a hint as to why.

 

Example 141
Let $U_{1},\ldots ,U_{n}$ be i.i.d. Uniform($[-1,1]$) random variables. Let $S_{n}=U_{1}+\ldots +U_{n}$, let $\bar{U}_{n}=S_{n}/n$ (sample mean) and let $Y_{n}=S_{n}/\sqrt{n}$. Consider the problem of finding the distribution of any of these. Since they are got from each other by scaling, finding the distribution of one is the same as finding that of any other. For uniform $[-1,1]$, we know that $\mu=0$ and ${\sigma}^{2}=1/3$. Hence, CLT tells us that $$ \mathbf{P}\left\{\frac{a}{\sqrt{3} } < Y_{n} < \frac{b}{\sqrt{3} }\right\}\rightarrow \Phi(b)-\Phi(a). $$ or equivalently, $\mathbf{P}\{a < Y_{n} < b\}\rightarrow \Phi(b\sqrt{3})-\Phi(a\sqrt{3})$. For large $n$ (practically, $n=50$ is large enough) we may use this limit as a good aproximation to the probability we want.

Why is this surprising? The way to find the distribution of $Y_{n}$ would be this. Using the convolution formula $n$ times successively, one can find the density of $S_{n}=U_{1}+\ldots +U_{n}$ (in principle! the actual integration may be intractable!). Then we can find the density of $Y_{n}$ by another change of variable (in one dimension). Having got the density of $Y_{n}$, we integrate it from $a$ to $b$ to get $\mathbf{P}\{a < Y_{n} < b\}$. This is clearly a daunting task (if you don't feel so, just try it for $n=5$).

The CLT cuts short all this and directly gives an approximate answer! And what is even more surprising is that the original distribution does not matter - we only need to know the mean and variance of the original distribution!

We shall not prove the central limit theorem in general. But we indicate how it is done when $X_{k}$ come from $\mbox{Exp}(\lambda)$ distribution. This is optional and may be skipped.

Let $X_{k}$ be i.i.d. $\mbox{Exp}(1)$ random variables. They have mean $\mu=1$ and variance ${\sigma}^{2}=1$. We know that (this was an exercise), $S_{n}=X_{1}+\ldots +X_{n}$ has $\mbox{Gamma}(n,1)$ distribution. Its density is given by $f_{n}(t)=e^{-t}t^{n-1}/(n-1)!$ for $t > 0$.

Now let $Y_{n}=\frac{S_{n}-n\mu}{{\sigma}\sqrt{n} }=\frac{S_{n}-n}{\sqrt{n} }$. By a change of variable (in one-dimension) we see that the density of $Y_{n}$ is given by $g_{n}(t)=\sqrt{n}f_{n}(n+t\sqrt{n})$. Let us analyse this. $$\begin{align*} g_{n}(t) &= \sqrt{n} \frac{1}{(n-1)!}e^{-(n+t\sqrt{n})}(n+t\sqrt{n})^{n-1} \\ &= \sqrt{n} \frac{n^{n-1} }{(n-1)!}e^{-n-t\sqrt{n} }\left(1+\frac{t}{\sqrt{n} }\right)^{n-1} \\ &\approx \sqrt{n} \frac{n^{n-1} }{\sqrt{2\pi}(n-1)^{n-\frac{1}{2} }e^{-n+1} }e^{-n-t\sqrt{n} }\left(1+\frac{t}{\sqrt{n} }\right)^{n-1} \hspace{2mm}(\mbox{by Stirling's formula})\\ &= \frac{1}{\sqrt{2\pi}(1-\frac{1}{n})^{n-\frac{1}{2} }e^{1} }e^{-t\sqrt{n} }\left(1+\frac{t}{\sqrt{n} }\right)^{n-1}. \end{align*}$$ To find the limit of this, first observe that $(1-\frac{1}{n})^{n-\frac{1}{2} }\rightarrow e^{-1}$. It remains to find the limit of $w_{n}:=e^{-t\sqrt{n} }\left(1+\frac{t}{\sqrt{n} }\right)^{n-1}$. Easiest to do this by taking logarithms. Recall that $\log(1+t)=t-\frac{t^{2} }{2}+\frac{t^{3} }{3}-\ldots$. Hence $$\begin{align*} \log w_{n} &= -t\sqrt{n} + (n-1)\log\left(1+\frac{t}{\sqrt{n} }\right) \\ &= -t\sqrt{n} + (n-1)\left[ \frac{t}{\sqrt{n} }-\frac{t^{2} }{2n}+\frac{t^{3} }{3n^{3/2} }-\ldots\right] \\ &= -\frac{t^{2} }{2}+[\ldots] \end{align*}$$ where in $[\ldots]$ we have put all terms which go to zero as $n\rightarrow \infty$. Since there are infinitely many, we should argue that even after adding all of them, the total goes to zero as $n\rightarrow \infty$. Let us skip this step and simply conclude that $\log w_{n}\rightarrow -t^{2}/2$. Therefore, $g_{n}(t) \rightarrow \varphi(t):=\frac{1}{\sqrt{2\pi} }e^{-t^{2}/2}$ which is the standard normal density.

What we wanted was $\mathbf{P}\{a < Y_{n} < b\}=\int\limits_{a}^{b}g_{n}(t) dt$. Since $g_{n}(t)\rightarrow \varphi(t)$ for each $t$, it is believable that $\int\limits_{a}^{b}g_{n}(t) dt\rightarrow \int\limits_{a}^{b}\varphi(t) dt$. This too needs justification but we skip it. Thus, $$ \mathbf{P}\{a < Y_{n} < b\} \rightarrow \int\limits_{a}^{b}\varphi(t) dt = \Phi(b)-\Phi(a). $$ This proves CLT for the case of exponential random variables.

Chapter 26. Poisson limit for rare events