Let $X_{1},X_{2},\ldots$ be i.i.d random variables (independent random variables each having the same marginal distribution). Assume that the second moment of $X_{1}$ is finite. Then, $\mu=\mathbf{E}[X_{1}]$ and ${\sigma}^{2}=\mbox{Var}(X_{1})$ are well-defined.

Let $S_{n}=X_{1}+\ldots +X_{n}$ (partial sums) and $\bar{X}_{n}=\frac{S_{n} }{n}=\frac{X_{1}+\ldots +X_{n} }{n}$ ( sample mean). Then, by the properties of expectation and variance, we have $$ \mathbf{E}[S_{n}]=n\mu, \mbox{Var}(S_{n})=n{\sigma}_{1}^{2}, \qquad \mathbf{E}[\bar{X}_{n}]=\mu, \mbox{Var}(\bar{X}_{n})=\frac{{\sigma}^{2} }{n}. $$ In particular, $\mbox{s.d.}(\bar{X}_{n})={\sigma}/\sqrt{n}$ decreases with $n$. If we apply Chebyshev's inequality to $\bar{X}_{n}$, we get for any $\delta > 0$ that $$ \mathbf{P}\{|\bar{X}_{n}-\mu|\ge \delta\}\le \frac{{\sigma}^{2} }{\delta^{2}n}. $$ This goes to zero as $n\rightarrow \infty$ (with $\delta > 0$ being fixed). This means that for large $n$ the sample mean is unlikely to be far from $\mu$ (sometimes called ''population mean''). This is consistent with our intuitive idea that if we toss a $p$-coin many times, we can get a better guess of what the value of $p$ is.

Weak law of large numbers (Jacob Bernoulli) : With the above notations, for any $\delta > 0$, we have $$ \mathbf{P}\{|\bar{X}_{n}-\mu|\ge \delta\}\le \frac{{\sigma}^{2} }{\delta^{2}n}\rightarrow 0 \mbox{ as }n\rightarrow \infty. $$

This is very general, in that we only assume the existence of variance. If $X_{k}$ are assumed to have more moments, one can get better bounds. For example, when $X_{k}$ are i.i.d. $\mbox{Ber}(p)$, we have the following theorem.

Hoeffding's inequality : Let $X_{1},\ldots ,X_{n}$ be i.i.d. $\mbox{Ber}(p)$. Then $$ \mathbf{P}\{|\bar{X}_{n}-p|\ge \delta\} \le 2e^{-n\delta^{2}/2}. $$

Chapter 24. Monte-Carlo integration