Definition 63
Let $A,B$ be two events in the same probability space.
  1. If $\mathbf{P}(B)\not=0$, we define the conditional probability of $A$ given $B$ as $$\mathbf{P}\left(A\left.\vphantom{\hbox{\Large (}}\right| B\right):=\frac{\mathbf{P}(A\cap B)}{\mathbf{P}(B)}.$$
  2. We say that $A$ and $B$ are independent if $\mathbf{P}(A\cap B)=\mathbf{P}(A)\mathbf{P}(B)$. If $\mathbf{P}(B)\not=0$, then $A$ and $B$ are independent if and only if $\mathbf{P}(A\ \pmb{\big|} \ B)=\mathbf{P}(A)$ (and similarly with the roles of $A$ and $B$ reversed). If $\mathbf{P}(B)=0$, then $A$ and $B$ are necessarily independent since $\mathbf{P}(A\cap B)$ must also be $0$.
What do these notions mean intuitively? In real life, we keep updating probabilities based on information that we get. For example, when playing cards, the chance that a randomly chosen card is an ace is $1/13$, but having drawn a card, the probability for the next card may not be the same - if the first card was seen to be an ace, then the chance of the second being an ace falls to $3/51$. This updated probability is called a conditional probability. Independence of two events $A$ and $B$ means that knowing whether or not $A$ occured does not change the chance of occurrence of $B$. In other words, the conditional probability of $A$ given $B$ is the same as the unconditional (original) probability of $A$.

 

Example 64
Let $\Omega=\{(i,j){\; : \;} 1\le i,j\le 6\}$ with $p_{(i,j)}=\frac{1}{36}$. This is the probability space corresponding to a throw of two fair dice. Let $A=\{(i,j){\; : \;} i\mbox{ is odd}\}$ and $B=\{(i,j){\; : \;} j \mbox{ is }1\mbox{ or }6\}$ and $C=\{(i,j){\; : \;} i+j=4\}$. Then $A\cap B=\{(i,j){\; : \;} i=1,3,\mbox{ or }5, \mbox{ and }j=1\mbox{ or }6\}$. Then, it is easy to see that $$ \mathbf{P}(A\cap B)=\frac{6}{36}=\frac{1}{6}, \mathbf{P}(A)=\frac{18}{36}=\frac{1}{2}, \mathbf{P}(B)=\frac{12}{36}=\frac{1}{3}. $$ In this case, $\mathbf{P}(A\cap B)=\mathbf{P}(A)\mathbf{P}(B)$ and hence $A$ and $B$ are independent. On the other hand, $$ \mathbf{P}(A\cap C)=\mathbf{P}\{(1,3),(2,2)\}=\frac{1}{18}, \mathbf{P}(C)=\mathbf{P}\{(1,3),(2,2),(3,1)\}=\frac{1}{12}. $$ Thus, $\mathbf{P}(A\cap C)\not=\mathbf{P}(A)\mathbf{P}(C)$ and hence $A$ and $C$ are not independent.

This agrees with the intuitive understanding of independence, since $A$ is an event that depends only on the first toss and $B$ is an event that depends only on the second toss. Therefore, $A$ and $B$ ought to be independent. However, $C$ depends on both tosses, and hence cannot be expected to be independent of $A$. Indeed, it is easy to see that $\mathbf{P}(C \ \pmb{\big|} \ A)=\frac{1}{9}$.

 

Example 65
Let $\Omega=S_{52}$ with $p_{\pi}=\frac{1}{52!}$. Define the events $$ A=\{\pi{\; : \;} \pi_{1}\in \{10,20,30,40\}\}, \qquad A=\{\pi{\; : \;} \pi_{2}\in \{10,20,30,40\}\}. $$ Then both $\mathbf{P}(A)=\mathbf{P}(B)=\frac{1}{13}$. However, $\mathbf{P}(B\ \pmb{\big|} \ A)=\frac{3}{51}$. One can also see that $\mathbf{P}(B \ \pmb{\big|} \ A^{c})=\frac{4}{51}$.

In words, $A$ (respectively $B$) could be the event that the first (respectively second) card is an ace. Then $\mathbf{P}(B)=4/52$ to start with. When we see the first card, we update the probability. If the first card was not an ace, we update it to $\mathbf{P}(B\ \pmb{\big|} \ A^{c})$ and if the first card was an ace, we update it to $\mathbf{P}(B\ \pmb{\big|} \ A)$.

Caution : Independence should not be confused with disjointness! If $A$ and $B$ are disjoint, $\mathbf{P}(A\cap B)=0$ and hence $A$ and $B$ can be independent if and only if one of $\mathbf{P}(A)$ or $\mathbf{P}(B)$ equals $0$. Intuitively, if $A$ and $B$ are disjoint, then knowing that $A$ occurred gives us a lot of information about $B$ (that it did not occur!), so independence is not to be expected.

 

Exercise 66
If $A$ and $B$ are independent, show that the following pairs of events are also independent.
  1. $A$ and $B^{c}$.
  2. $A^{c}$ and $B$.
  3. $A^{c}$ and $B^{c}$.

Total probability rule and Bayes' rule : Let $A_{1},\ldots ,A_{n}$ be pairwise disjoint and mutually exhaustive events in a probability space. Assume $\mathbf{P}(A_{i}) > 0$ for all $i$. This means that $A_{i}\cap A_{j}=\emptyset$ for any $i\not= j$ and $A_{1}\cup A_{2}\cup \ldots \cup A_{n}=\Omega$. We also refer to such a collection of events as a partition of the sample space.

Let $B$ be any other event.

  1. (Total probability rule). $\mathbf{P}(B)=\mathbf{P}(A_{1})\mathbf{P}(B\ \pmb{\big|} \ A_{1})+\ldots +\mathbf{P}(A_{n})\mathbf{P}(B\ \pmb{\big|} \ A_{n})$.
  2. (Bayes' rule). Assume that $\mathbf{P}(B) > 0$. Then, for each $k=1,2,\ldots ,n$, we have $$\mathbf{P}(A_{k}\ \pmb{\big|} \ B)=\frac{\mathbf{P}(A_{k})\mathbf{P}(B\ \pmb{\big|} \ A_{k})}{\mathbf{P}(A_{1})\mathbf{P}(B\ \pmb{\big|} \ A_{1})+\ldots +\mathbf{P}(A_{n})\mathbf{P}(B\ \pmb{\big|} \ A_{n})}.$$
The proof is merely by following the definition.
  1. The right hand side is equal to $$ \mathbf{P}(A_{1})\frac{\mathbf{P}(B\cap A_{1})}{\mathbf{P}(A_{1})}+\ldots +\mathbf{P}(A_{n})\frac{\mathbf{P}(B\cap A_{n})}{\mathbf{P}(A_{n})}=\mathbf{P}(B\cap A_{1})+\ldots +\mathbf{P}(B\cap A_{n}) $$ which is equal to $\mathbf{P}(B)$ since $A_{i}$ are pairwise disjoint and exhaustive.
  2. Without loss of generality take $k=1$. Note that $\mathbf{P}(A_{1}\cap B)=\mathbf{P}(A_{1})\mathbf{P}(B\cap A_{1})$. Hence $$\begin{align*} \mathbf{P}(A_{1}\ \pmb{\big|} \ B) &= \frac{\mathbf{P}(A_{1}\cap B)}{\mathbf{P}(B)} \\ &= \frac{\mathbf{P}(A_{1})\mathbf{P}(B\ \pmb{\big|} \ A_{1})}{\mathbf{P}(A_{1})\mathbf{P}(B\ \pmb{\big|} \ A_{1})+\ldots +\mathbf{P}(A_{n})\mathbf{P}(B\ \pmb{\big|} \ A_{n})} \end{align*}$$ where we used the total probability rule to get the denominator. \qedhere

 

Exercise 67
Suppose $A_{i}$ are events such that $\mathbf{P}(A_{1}\cap \ldots \cap A_{n}) > 0$. Then show that $$\mathbf{P}(A_{1}\cap \ldots \cap A_{n})=\mathbf{P}(A_{1})\mathbf{P}(A_{2}\ \pmb{\big|} \ A_{1})\mathbf{P}(A_{3}\ \pmb{\big|} \ A_{1}\cap A_{2})\ldots \mathbf{P}(A_{n}\ \pmb{\big|} \ A_{1}\cap \ldots \cap A_{n-1}).$$

 

Example 68
Consider a rare disease $X$ that affects one in a million people. A medical test is used to test for the presence of the disease. The test is 99\% accurate in the sense that if a person has no disease, the chance that the test shows positive is 1\% and if the person has disease, the chance that the test shows negative is also 1\%.

Suppose a person is tested for the disease and the test result is positive. What is the chance that the person has the disease $X$?

Let $A$ be the event that the person has the disease $X$. Let $B$ be the event that the test shows positive. The given data may be summarized as follows.

  1. $\mathbf{P}(A)=10^{-6}$. Of course $\mathbf{P}(A^{c})=1-10^{-6}$.
  2. $\mathbf{P}(B \ \pmb{\big|} \ A)=0.99$ and $\mathbf{P}(B\ \pmb{\big|} \ A^{c})=0.01$.
What we want to find is $\mathbf{P}(A\ \pmb{\big|} \ B)$. By Bayes' rule (the relevant partition is $A_{1}=A$ and $A_{2}=A^{c}$), $$ \mathbf{P}(A\ \pmb{\big|} \ B) = \frac{\mathbf{P}(B\ \pmb{\big|} \ A)\mathbf{P}(A)}{\mathbf{P}(B\ \pmb{\big|} \ A)\mathbf{P}(A)+\mathbf{P}(B\ \pmb{\big|} \ A^{c})\mathbf{P}(A^{c})} = \frac{0.99 \times 10^{-6} }{0.99\times 10^{-6}+0.01\times (1-10^{-6})} = 0.000099. $$ The test is quite an accurate one, but the person tested positive has a really low chance of actually having the disease! Of course, one should observe that the chance of having disease is now approximately $10^{-4}$ which is considerably higher than $10^{-6}$.

A calculation-free understanding of this surprising looking phenomenon can be achieved as follows: Let everyone in the population undergo the test. If there are $10^{9}$ people in the population, then there are only $10^{3}$ people with the disease. The number of true positives is approximately $10^{3}\times 0.99\approx 10^{3}$ while the number of false positives is $(10^{9}-10^{3})\times 0.01\approx 10^{7}$. In other words, among all positives, the false positives are way more numerous than true positives.

The surprise here comes from not taking into account the relative sizes of the sub-populations with and without the disease. Here is another manifestation of exactly the same fallacious reasoning.

Question : A person $X$ is introverted, very systematic in thinking and somewhat absent-minded. You are told that he is a doctor or a mathematician. What would be your guess - doctor or mathematician?

As we saw in class, most people answer ''mathematician''. Even accepting the stereotype that a mathematician is more likely to have all these qualities than a doctor, this answer ignores the fact that there are perhaps a hundred times more doctors in the world than mathematicians! In fact, the situation is identical to the one in the example above, and the mistake is in confusing $\mathbf{P}(A\big| B)$ and $\mathbf{P}(B\big| A)$.

Chapter 11. Independence of three or more events