Let $X_{k}\sim \mbox{Ber}(p)$ be independent random variables. Central limit theorem says that if $p$ is fixed and $n$ is large, the distribution of $(X_{n}-np)/\sqrt{np(1-p)}$ is close to the $N(0,1)$ distribution.

Now we consider a slightly different situation. Let $X_{1},\ldots ,X_{n}$ have $\mbox{Ber}(n,p_{n})$ distribution where $p_{n}=\frac{\lambda}{n}$, where $\lambda > 0$ is fixed. Then, we shall show that the distribution of $X_{1}+\ldots +X_{n}$ is close to that of $\mbox{Pois}(\lambda)$. Note that the distribution of $X_{1}$ changes with $n$ and hence it would be more correct to write $X_{n,1}, \ldots ,X_{n,n}$.

 

Theorem 142
Let $\lambda > 0$ be fixed and let $X_{n,1},\ldots ,X_{n,n}$ be i.i.d. $\mbox{Ber}(\lambda/n)$. Let $S_{n}=X_{n,1}+\ldots +X_{n,n}$. Then, for every $k\ge 0$ $$ \mathbf{P}\{S_{n}=k\}\rightarrow e^{-\lambda}\frac{\lambda^{k} }{k!}. $$
Fix $k$ and observe that $$\begin{align*} \mathbf{P}\{S_{n}=k\} &= \binom{n}{k}\left(\frac{\lambda}{n}\right)^{k}\left(1-\frac{\lambda}{n}\right)^{n-k} \\ &= \frac{n(n-1)\ldots (n-k+1)}{k!}\frac{\lambda^{k} }{n^{k} }\left(1-\frac{\lambda}{n}\right)^{n-k}. \end{align*}$$ Note that $\frac{n(n-1)\ldots(n-k+1)}{n^{k} }\rightarrow 1$ as $n\rightarrow \infty$ (since $k$ is fixed). Also, $(1-\frac{\lambda}{n})^{n-k}\rightarrow e^{-\lambda}$ (if not clear, note that $(1-\frac{\lambda}{n})^{n}\rightarrow e^{-\lambda}$ and $(1-\frac{\lambda}{n})^{-k}\rightarrow 1$). Hence, the right hand side above converges to $e^{-\lambda}\frac{\lambda^{k} }{k!}$ which is what we wanted to show.
What is the meaning of this? Bernoulli random variables may be thought of as indicators of events, i.e., think of $X_{n,1}$ as ${\mathbf 1}_{A_{1} }$ etc. The theorem considers $n$ events which are independent and each of them is ''rare'' (since the probability of it occurring is $\lambda/n$ which becomes small as $n$ increases). The number of events increases but the chance of each events decreases in such a way that the expected number of events that occur stays constant. Then, the total number of events that actually occur has an approximately Poisson distribution.

 

Example 143
(A physical example). A large amount of custard is made in the hostel mess to serve $100$ students. The cook adds $300$ raisins and mixes the custard so that on an average they get $3$ raisins per student. But the number of raisins that a given student gets is random and the above theorem says that it has approximately $\mbox{Pois}(3)$ distribution. How so? Let $X_{k}$ be the indicator of the event that the $k$th raisin ends up in your cup. Since there are $100$ cups, the chance of this happening is $1/100$. The number of raisins in your cup is precisely $X_{1}+X_{2}+\ldots +X_{300}$. Appy the theorem (take $n=100$ and $\lambda=3$).

 

Example 144
Place $r$ balls in $m$ bins at random. If $m=1000$ and $r=500$, then the number of balls in the first bin has approximately $\mbox{Pois}(1/2)$ distribution. Work out how this comes from the theorem.

The Poisson limit is a much more general phenomenon than what the theorem above captures. For example, consider the problem of a psychic guessing a deck of cards. If $X$ is the number of correct guesses, we saw (by direct calculation and approximation) that $\mathbf{P}\{X=k\}$ is close to $e^{-1}/k!$. In other words $X$ has approximately $\mbox{Pois}(1)$ distribution. Does it follows from the theorem above. Let us try.

Set $X_{k}$ to be the indicator of the event that the $k$th guess is correct. Then $X_{k}\sim \mbox{Ber}(1/52)$ and $X=X_{1}+\ldots +X_{52}$. It looks like the theorem tells us that $X$ should have $\mbox{Pois}(1)$ distribution (by taking $n=52$ and $\lambda=1$). But note that $X_{i}$ are not independent random variables and hence the theorem does not strictly apply. The theorem should be thought of as one of many theorems that capture the theme ''in a large collection of rare events that are nearly independent, the actual number of events that occur is approximately Poisson''.

Chapter 27. Entropy, Gibbs distribution