Probability Models and Stastics

At various times we have made statements such as ''heights follow normal distribution'', ''lifetimes of bulbs follow exponential distribution'' etc. Where do such claims come from? Over years of analysing data, of course. This leads to an interesting question. Can we test whether lifetimes of bulbs do follow exponential distribution?

We start with a simple example of testing whether a die is fair. The hypotheses are $H_{0}:$ the die is fair, versus $H_{1}:$ the die is unfair¹You may feel that the null and alternative hypotheses are reversed. Is not independence a special property that should prove itself. Yes and no. Here we are imagining a situation where we have some reason to think that the die is fair. For example perhaps the die looks symmetric. .

We throw the die $n$ times and record the observations $X_{1},\ldots ,X_{n}$. For $j\le 6$, let $O_{j}$ be the number of times we observe the face $j$ turn up. In symbols $O_{j}=\sum_{i=1}^{n}{\mathbf 1}_{X_{i}=j}$. Let $E_{j}=\mathbf{E}[O_{j}]=\frac{n}{6}$ be the expected number of times we see the face $j$ (under the null hypothesis). Common sense says that if $H_{0}$ is true then $O_{j}$ and $E_{j}$ must be rather close for each $j$. How to measure the closeness? Karl Pearson introduced the test statistic $$ T:=\sum_{j=1}^{6}\frac{(O_{j}-E_{j})^{2} }{E_{j} }. $$ If the desired level of significance is $\alpha$, then the Pearson $\chi^{2}$-test says ''Reject $H_{0}$ if $T\ge \chi^{2}_{5}(\alpha)$''. The number of degrees of freedom is $5$ here. In general, it is one less than the number of bins (i.e., how many terms you are summing to get $T$).

Some practical points : The $\chi^{2}$ test is really an asymptotic statement. For large $n$, the level of significance is approximately $1-\alpha$. There is no assurance for small $n$. Further, in performing the test, it is recommended that each bin must have at least $5$ observations (i.e., $O_{j}\ge 5$). Otherwise we club together bins with fewer entries. The number $5$ is a rule of thumb, the more the better.

Fitting the Poisson distribution : We consider the famous data collected by Rutherford, Chadwick and Ellis on the number of radioactive disintegrations. For details see the book of Feller's book (section VI.7) or \hrefhttp://galton.uchicago.edu/ lalley/Courses/312/PoissonProcesses.pdfthis website.

The data consists of $X_{1},\ldots ,X_{2608}$ (where $X_{k}$ is the number of particles detected by the counter in the $k^{\mbox{th} }$ time interval. The hypotheses are $$ H_{0}: F \mbox{ is a Poisson distribution}. \qquad H_{1}: F \mbox{ is not Poisson}. $$ The physical theories predict that the distribution ought to be Poisson and hence we have taken it as the null hypothesis²When a new theory is proposed, it should prove itself and is put in the alterntive hypotheis, but here we take it as null.

We define $O_{j}$ as the number of time intervals in which we see exactly $j$ particles. Thus $O_{j}=\sum_{i=1}^{2608}{\mathbf 1}_{X_{i}=j}$. How do we find the expected numbers? If the null hypothesis had said that $F$ has Poisson(1) distribution, we could use that to find the expected numbers. But $H_{0}$ only says Poisson($\lambda$) for an unspecified $\lambda$? This brings in a new feature.

First estimate $\lambda$, for example $\hat{\lambda}=\bar{X}_{n}$ is an MLE as well as method of moments estimate. Then we use this to calculate Poisson probabilities and the expected numbers. In other words, $E_{j}=e^{-\hat{\lambda} }\frac{\hat{\lambda}^{j} }{j!}$. For the given data we find that $\hat{\lambda}=3.87$. The table is as follows.

\[ \begin{array}{||r|c|c|c|c|c|c|c|c|c|c|c||} \hline j & 0 & 1 & 2 & 3 & 4 & 5 & 6 & 7 & 8 & 9 & \ge 10 \\ \hline O_{j} & 57 & 203 & 383 & 525 & 532 & 408 & 273 & 139 & 45 & 27 & 16 \\ \hline E_{j} & 54.4 & 210.5 & 407.4 & 525.4 & 508.4 & 393.5 & 253.8 & 140.3 & 67.9 & 29.2 & 17.1 \\ \hline \end{array} \]

Two remarks: The original data would have consisted of several more bins for $j=11,12\ldots$. These have been clubbed together to perform the $\chi^{2}$ test (instead of a minimum of $5$ per bin, they may have ensured that there are at least $10$ per bin). Also, the estimate $\hat{\lambda}=3.87$ was obtained before clubbing these bins. Indeed, if the data is merely presented as the above table, there will be some ambiguity in how to find $\hat{\lambda}$ as one of the bins says ''$\ge 10$''.

Then we compute $$ T=\sum_{j=0}^{10}\frac{(O_{j}-E_{j})^{2} }{E_{j} } = 14.7. $$ Where should we look up in the $\chi^{2}$ table? Earlier we said that the degrees of freedom is one less than the number of bins. Here we give the more general rule. $$ \mbox{Degrees of freedom of the }\chi^{2} = \mbox{ No. of bins }-1-\mbox{No. of parameters estimated from data}. $$ In our case we estimated one parameter, $\lambda$ hence the d.f. of the $\chi^{2}$ is $11-1-1=9$. Looking at $\chi_{9}^{2}$ table one can see that the $p$-value is $0.10$. This is the probability that a $\chi_{9}^{2}$ random variable is greater than $14.7$. (Caution: Elsewhere I see that the $p$-value for this experiment is reported as $0.17$, please check my calculations!). This means that at $5\%$ level, we would not reject the null hypothesis. If the $p$-value was $0.17$, we would not reject the null hypothesis even at $10\%$ level.

Fitting a continuous distribution : Chi-squared test can be used to test goodness of fit for continuous distributions too. We need some modifications. We must make bins of appropriate size, like $[a,a+h],[a+h,a+2h],\ldots ,[a+h(k-1),a+hk]$ for a suitable $h$ and $k$. Then we find the expected numbers in each bin using the null hypothesis (first estimating some parameters if necessary) and then proceed to compute $T$ in the same way as before. Then check against the $\chi^{2}$ table with the appropriate degrees of freedom. We omit details.

The probability theorem behind the $\chi^{2}$-test for goodness of fit : Let $(W_{1},\ldots ,W_{k})$ have multinomial distribution with parameters $n,m,(p_{1},\ldots ,p_{k})$. (In other words, place $n$ balls at random into $m$ bins, but each ball goes into the $i^{\mbox{th} }$ bin with probability $p_{i}$ and distinct balls are assigned independently of each other). The following proposition is the mathematics behind Pearson's test.

Proposition [Pearson] : Fix $k,p_{1},\ldots,p_{k}$. Let $T_{n}=\sum_{i=1}^{k}\frac{(W_{i}-np_{i})^{2} }{np_{i} }$. Then $T_{n}$ converges to a $\chi_{k-1}^{2}$ distribution in the sense that $\mathbf{P}\{T_{n}\le x\}\rightarrow \int_{0}^{x}f_{k-1}(u)du$ where $f_{k-1}$ is the density of $\chi_{k-1}^{2}$ distribution.

How does this help? Suppose $X_{1},\ldots ,X_{n}$ are i.i.d. random variables taking $k$ values (does not matter what the values are, say $t_{1},t_{2},\ldots ,t_{k}$) with probabilities $p_{1},\ldots ,p_{k}$. Then, let $W_{i}$ be the number of $X_{i}$s whose value is $t_{i}$. Clearly, $(W_{1},\ldots ,W_{k})$ has a multinomial distribution. Therefore, for large $n$, the random variable $T_{n}$ defined above (which is in fact the $\chi^{2}$-statistic of Pearson) has approximately $\chi_{k-1}^{2}$ distribution. This explains the test.

Sketch of proof of the proposition : Start with the case $k=2$. Then, $W_{1}\sim \mbox{Bin}(n,p_{1})$ and $W_{2}=r-W_{1}$. Thus, $T_{n}=\frac{(W_{1}-np_{1})^{2} }{np_{1}p_{2} }$ (recall that $p_{1}+p_{2}=1$ and check this!). We know that $(W_{1}-np_{1})/\sqrt{np_{1}q_{1} }$ is approximately a $N(0,1)$ random variable, where $q_{i}=1-p_{i}$). Its square has (approximately$\chi_{1}^{2}$ distribution. Thus the proposition is proved for $k=2$.

When $k > 2$, what happens is that the random variables $\xi_{i}:=(W_{i}-np_{i})/\sqrt{np_{i}q_{i} }$ are approximately $N(0,1)$, but not independent. In fact the correlation between $\xi_{i}$ and $\xi_{j}$ is close to $-\sqrt{p_{i}p_{j}/q_{i}q_{j} }$. The sum of squares of $\xi_{i}$s gives the $\chi^{2}$ statistic. On the other hand, one can (with some clever linear algebra/matrix manipulation) write $\sum_{i=1}^{k}\xi_{i}^{2}$ as $\sum_{i=1}^{k-1}\eta_{i}^{2}$ where $\eta_{i}$ are independent $N(0,1)$ random variables. Thus we get $\chi_{k-1}^{2}$ distribution.

Chapter 39. Tests for independence

Chapter 38 : Chi-squared test for goodness of fit