Earlier in the course we discussed the problem of how to test whether a ''psychic'' can make predictions better than a random guesser. This is a prototype of what are called testing problems. We start with this simple example and introduce various general terms and notions in the context of this problem.
One hypotheses (we call it the null hypothesis and denote it by $H_{0}$) is that the psychic is guessing randomly. The alternate hypothesis (denoted $H_{1}$) is that his/her guesses are better than random guessing (in itself this does not imply existence of psychic powers. It could be that he/she has managed to see some of the cards etc.). Can we decide between the two hypotheses based on $X$?
What we need is a rule for deciding which hypothesis is true. A rule for deciding between the hypotheses is called a test. For example, the following are examples of rules (the only condition is that the rule must depend only on the data at hand).
We fix the desired level of significance, usually $\alpha=0.05$ or $0.1$ and only consider tests whose probability of type-I error is at most $\alpha$. It may seem surprising that we take $\alpha$ to be so small. Indeed the two hypotheses are not treated equally. Usually $H_{0}$ is the default option, representing traditional belief and $H_{1}$ is a claim that must prove itself. As such, the burden of proof is on $H_{1}$.
To use analogy with law, when a person is convicted, there are two hypotheses, one that he is guilty and the other that he is not guilty. According to the maxim ''innocent till proved guilty'', one is not required to prove his/her innocence. On the other hand guilt must be proved. Thus the null hypothesis is ''not guilty'' and the alternative hypothesis is ''guilty''.
In our example of card-guessing, assuming random guessing, we have calculated the distribution of $X$ long ago. Let $p_{k}=\mathbf{P}\{X=k\}$ for $k=0,1,\ldots ,52$. Now consider a test of the form ''Accept $H_{1}$ if $X\ge k_{0}$ and reject otherwise''. Its level of significance is $$ \mathbf{P}_{H_{0} }\{\mbox{accept }H_{1}\} = \mathbf{P}_{H_{0} }\{X\ge k_{0}\} = \sum_{i=k_{0} }^{52}p_{i}. $$ For $k_{0}=0$, the right side is $1$ while for $k_{0}=52$ it is $1/52!$ which is tiny. As we increase $k_{0}$ there is a first time where it becomes less than or equal to $\alpha$. We take that $k_{0}$ to be the threshold for cut-off.
In the same example of card-guessing, let $\alpha=0.01$. Let us also assume that Poisson approximation holds. This means that $p_{j}\approx e^{-1}/j! $ for each $j$. Then, we are looking for the smallest $k_{0}$ such that $\sum_{j=k_{0} }^{\infty}e^{-1}/j! \le 0.01$. For $k_{0}=4$, this sum is about $0.019$ while for $k_{0}=5$ this sum is $0.004$. Hence, we take $k_{0}=5$. In other words, accept $H_{1}$ if $X\ge 5$ and reject if $X < 5$. If we took $\alpha=0.0001$ we would get $k_{0}=7$ and so on.
Strength of evidence : Rather than merely say that we accepted $H_{1}$ or rejected it would be better to say how strong the evidence is in favour of the alternative hypothesis. This is captured by the $p$-value, a central concept of decision making. It is defined as the probability that data drawn from the null hypothesis would show closer agreement with the alternative hypothesis than the data we have at hand (read it five times!).
Before we compute it in our example, let us return to the analogy with law. Suppose a man is convicted for murder. Recall that $H_{0}$ is that he is not guilty and $H_{1}$ is that he is guilty. Suppose his fingerprints were found in the house of the murdered person. Does it prove his guilt? It is some evidence in favour of it, but not necessarily strong. For example, if the convict was a friend of the murdered person, then he might be innocent but have left his fingerprints on his visits to his friend. However if the convict is a total stranger, then one wonders why, if he was innocent, his finger prints were found there. The evidence is stronger for guilt. If bloodstains are found on his shirt, the evidence would be even stronger! In saying this, we are asking ourselves questions like ''if he was innocent, how likely is it that his shirt is blood-stained?''. That is $p$-value. Smaller the $p$-value, stronger the evidence for the alternate hypothesis.
Now we return to our example. Suppose the observed value is $X_{\mbox{obs} }=4$. Then the $p$-value is $\mathbf{P}\{X\ge 4\}=p_{4}+\ldots +p_{52}\approx 0.019$. If the observed value was $X_{\mbox{obs} }=6$, then the $p$-value would be $p_{6}+\ldots +p_{52}\approx 0.00059$. Note that the computation of $p$-value does not depend on the level of significance. It just depends on the given hypotheses and the chosen test.