So far we have defined the notion of probability space and probability of an event. But most often, we do not calculate probabilities from the definition. This is like in integration, where one defined the integral of a function as a limit of Riemann sums, but that definition is used only to find integrals of $x^{n}$, $\sin(x)$ and a few such functions. Instead, integrals of complicated expressions such as $x\sin(x)+2\cos^{2}(x)\tan(x)$ are calculated by various rules, such as substitution rule, integration by parts etc. In probability we need some similar rules relating probabilities of various combinations of events to the individual probabilities.

 

Proposition 50
Let $(\Omega,p_{\cdot})$ be a discrete probability space.
  1. For any event $A$, we have $0\le \mathbf{P}(A)\le 1$. Also, $\mathbf{P}(\emptyset)=0$ and $\mathbf{P}(\Omega)=1$.
  2. Finite additivity of probability: If $A_{1},\ldots ,A_{n}$ are pairwise disjoint events, then $\mathbf{P}(A_{1}\cup \ldots \cup A_{n})=\mathbf{P}(A_{1})+\ldots +\mathbf{P}(A_{n})$. In particular, $\mathbf{P}(A^{c})=1-\mathbf{P}(A)$ for any event $A$.
  3. Countable additivity of probability: If $A_{1},A_{2},\ldots$ is a countable collection of pairwise disjoint events, then $\mathbf{P}(\cup A_{i})=\sum_{i}\mathbf{P}(A_{i})$.
All of these may seem obvious, and indeed they would be totally obvious if we stuck to finite sample spaces. But the sample space could be countable, and then probability of events may involve infinite sums which need special care in manipulation. Therefore we must give a proof. In writing a proof, and in many future contexts, it is useful to introduce the following notation.

Notation : Let $A\subseteq \Omega$ be an event. Then, we define a function ${\mathbf 1}_{A}:\Omega\rightarrow \mathbb{R}$, called the indicator function of $A$, as follows. $$ {\mathbf 1}_{A}(\omega) = \begin{cases} 1 & \mbox{ if }\omega\in A,\\ 0 & \mbox{ if }\omega\not\in A. \end{cases} $$ Since a function from $\Omega$ to $\mathbb{R}$ is called a random variable, the indicator of any event is a random variable. All information about the event $A$ is in its indicator function (meaning, if we know the value of ${\mathbf 1}_{A}(\omega)$, we know whether or not $\omega$ belongs to $A$). For example, we can write $\mathbf{P}(A)=\sum_{\omega\in \Omega}{\mathbf 1}_{A}(\omega)p_{\omega}$.

Now we prove the proposition.

  1. By definition of probability space $\mathbf{P}(\Omega)=1$ and $\mathbf{P}(\emptyset)=0$. If $A$ is any event, then ${\mathbf 1}_{\emptyset}(\omega)p_{\omega}\le {\mathbf 1}_{A}(\omega)p_{\omega}\le {\mathbf 1}_{\Omega}(\omega)p_{\omega}$. By Exercise 41, we get $$ \sum_{\omega\in \Omega}{\mathbf 1}_{\emptyset}(\omega)p_{\omega} \le \sum_{\omega\in \Omega}{\mathbf 1}_{A}(\omega)p_{\omega} \le \sum_{\omega\in \Omega}{\mathbf 1}_{\Omega}(\omega)p_{\omega}. $$ As observed earlier, these sums are just $\mathbf{P}(\emptyset)$, $\mathbf{P}(A)$ and $\mathbf{P}(\Omega)$, respectively. Thus, $0\le \mathbf{P}(A)\le 1$.
  2. It suffices to prove it for two sets (why?). Let $A,B$ be two events such that $A\cap B=\emptyset$. Let $f(\omega)=p_{\omega}{\mathbf 1}_{A}(\omega)$ and $g(\omega)=p_{\omega}{\mathbf 1}_{B}(\omega)$ and $h(\omega)=p_{\omega}{\mathbf 1}_{A\cup B}(\omega)$. Then, the disjointness of $A$ and $B$ implies that $f(\omega)+g(\omega)=h(\omega)$ for all $\omega\in \Omega$. Thus, by Exercise 41, we get $$ \sum_{\omega\in \Omega}f(\omega)+\sum_{\omega\in \Omega}g(\omega) = \sum_{\omega\in \Omega}h(\omega). $$ But the three sums here are precisely $\mathbf{P}(A)$, $\mathbf{P}(B)$ and $\mathbf{P}(A\cup B)$. Thus, we get $\mathbf{P}(A\cup B)=\mathbf{P}(A)+\mathbf{P}(B)$.
  3. This is similar to finite additivity but needs a more involved argument. We leave it as an exercise for the interested reader. \qedhere

 

Exercise 51
Adapt the proof to prove that for a countable family of events $A_{k}$ in a common probability space (no disjointness assumed), we have $$ \mathbf{P}(\cup_{k}A_{k})\le \sum_{k}\mathbf{P}(A_{k}). $$

 

Definition 52 (Limsup and liminf of sets)
If $A_{k}$, $k\ge 1$, is a sequence of subsets of $\Omega$, we define $$\begin{equation*} \limsup A_{k}=\bigcap_{N=1}^{\infty}\bigcup_{k=N}^{\infty}A_{k}, \qquad \mbox{ and } \qquad \liminf A_{k}=\bigcup_{N=1}^{\infty}\bigcap_{k=N}^{\infty}A_{k}. \end{equation*}$$ In words, $\limsup A_{k}$ is the set of all $\omega$ that belong to infinitely many of the $A_{k}$s, and $\liminf A_{k}$ is the set of all $\omega$ that belong to all but finitely many of the $A_{k}$s.

Two special cases are of increasing and decreasing sequences of events. This means $A_{1}\subseteq A_{2}\subseteq A_{3}\subseteq \ldots$ and $A_{1}\supseteq A_{2}\supseteq A_{3}\supseteq \ldots$. In these cases, the limsup and liminf are the same (so we refer to it as the limit of the sequence of sets). It is $\cup_{k}A_{k}$ in the case of increasing events and $\cap_{k}A_{k}$ in the case of decreasing events.

 

Exercise 53
Events below are all contained in a discrete probability space. Use countable additivity of probability to show that
  1. If $A_{k}$ are increasing events with limit $A$, show that $\mathbf{P}(A)$ is the increasing limit of $\mathbf{P}(A_{k})$.
  2. If $A_{k}$ are decreasing events with limit $A$, show that $\mathbf{P}(A)$ is the decreasing limit of $\mathbf{P}(A_{k})$.

Now we re-write the basic rules of probability as follows.

The basic rules of probability :

  1. $\mathbf{P}(\emptyset)=0$, $\mathbf{P}(\Omega)=1$ and $0\le \mathbf{P}(A)\le 1$ for any event $A$.
  2. $\mathbf{P}\left(\bigcup\limits_{k}A_{k}\right)\le \sum\limits_{k} \mathbf{P}(A_{k})$ for any countable collection of events $A_{k}$.
  3. $\mathbf{P}\left(\bigcup\limits_{k}A_{k}\right)=\sum\limits_{k}\mathbf{P}(A_{k})$ if $A_{k}$ is a countable collection of pairwise disjoint events.

Chapter 7. Inclusion-exclusion formula