Probability problems

by Manjunath Krishnapur

PDF version
Note : These are problems I gave as homeworks in the many times I taught the first course in UG probability and statistics at IISc. They are taken from various sources and there are also some that I made up.

 

Problem 1
(Feller, I.8.1) Among the digits 1,2,3,4,5 first one is chosen, and then a second selection is made among the remaining four digits. Assume that all twenty possible results have the same probability. Find the probability that an odd digit will be selected (a) the first time, (b) the second time, (c) both times.

 

Problem 2
(*) (Feller, II.10.3) In how many ways can two rooks of different colours be put on a chessboard so that they can take each other?

 

Problem 3
(Feller, II.10.5) The numbers $1,2,\ldots ,n$ are arranged in random order. Find the probability that the digits (a) 1 and 2, (b) 1, 2, and 3, appear as neighbours in the order named.

 

Problem 4
(Feller, II.10.8) What is the probability that among $k$ digits (a) 0 does not appear; (b) 1 does not appear; (c) neither 0 nor 1 appears; (d) at least one of the two digits $0$ or $1$ does not appear? Let $A$ and $B$ represent the events in (a) and (b). Express the other events in terms of $A$ and $B$.

 

Problem 5
(Feller, II.10.11) A man is given $n$ keys of which only one fits his door. He tries them successively (sampling without replacement). The number of trials required may be $1,2,\ldots ,n$. Show that each of these outcomes has probability $1/n$.

 

Problem 6
(*) (Feller, II.10.20) From a population of $N$ elements a sample of size $k$ is taken. Find the probability that none of $m$ prescribed elements will be included in the sample, assuming the sample to be (a) without replacement, (b) with replacement. Compare the numerical values for the two methods when (i) $N=100$, $m=k=3$, and (ii) $N=100$, $m=k=10$.

 

Problem 7
(Feller, II.10.28) A group of $2N$ boys and $2N$ girls is divided into two equal groups. Find the probability $p$ that each group has equal number of boys and girls. Estimate $p$ using Stirling's approximation.

 

Problem 8
(*) (Feller, II.10.39) If $r_{1}$ indistinguishable red balls and $r_{2}$ indistinguishable blue balls are placed into $n$ cells, find the number of distinguishable arrangements.

 

Problem 9
(Feller, II.10.40) If $r_{1}$ dice and $r_{2}$ coins are thrown, how many results can be distinguished?

 

Problem 10
In how many ways can two bishops be put on a chessboard so that they can take each other?

 

Problem 11
A deck of $n$ cards labelled $1,2,\ldots ,n$ is shuffled well. Find the probability that the digits (a) 1 and 2, (b) 1, 2, and 3, appear as neighbours in the order named.

 

Problem 12
A deck of $n$ cards labelled $1,2,\ldots ,n$ is shuffled well. Find the probability that the digits (a) 1 and 2, (b) 1, 2, and 3, appear as neighbours in the order named.

 

Problem 13
(Feller, II.12.1) Prove the following identities for $n\ge 2$. [ Convention: Let $n$ be a positive integer. Then $\binom{n}{y}=0$ if $y$ is not an integer or if $y > n$]. $$\begin{align*} 1-\binom{n}{1}+\binom{n}{2} -\ldots &=0 \\ \binom{n}{1}+2\binom{n}{2}+3\binom{n}{3}+\ldots &=n2^{n-1} \\ \binom{n}{1}-2\binom{n}{2}+3\binom{n}{3}-\ldots &=0 \\ 2.1\binom{n}{2}+3.2\binom{n}{3}+4.3\binom{n}{4}+\ldots &=n(n-1)2^{n-2} \end{align*}$$

 

Problem 14
(Feller, I.12.10) Prove that $$ \binom{n}{0}^{2}+\binom{n}{1}^{2}+\ldots +\binom{n}{n}^{2}=\binom{2n}{n}. $$

 

Problem 15
(Feller, I.12.20) Using Stirling's formula, prove that $\frac{1}{2^{2n} }\binom{2n}{n}\sim \frac{1}{\sqrt{\pi n} }$. [ Convention: $a_{n}\sim b_{n}$ is shorthand for $\lim\limits_{n\rightarrow \infty}\frac{a_{n} }{b_{n} }=1$].

 

Problem 16
(*) Two positive integers $m\le N$ are fixed. A box contains $N$ coupons labelled $1,2,\ldots ,N$. A sample of $m$ coupons is drawn.
  1. Write the probability space in the following two ways of drawing the sample.
    1. (Sampling without replacement). A coupon is drawn uniformly at random, then a second coupon is drawn uniformly at random, and so on, till we have $m$ coupons.
    2. (Sampling with replacement). A coupon is drawn uniformly at random, its number is noted and the coupon is replaced in the box. Then a coupon is drawn at random from the box, the number is noted, and the coupon is returned to the box. This is done $m$ times.
  2. Let $N=k+\ell$ where $k,\ell$ are positive integers. We think of $\{1,2,\ldots ,k\}$ as a sub-population of the whole population $\{1,2,\ldots ,N\}$. For each of the above two schemes of sampling (with and without replacement), calculate the probability that the sample of size $m$ contains no elements from the sub-population $\{1,2,\ldots ,k\}$.

 

Problem 17
(*) Write the probability spaces for the following experiments. Coins and dice may not be fair!
  1. A coin is tossed till we get a head followed immediately by a tail. Find the probability of the event that the total number of tosses is at least $N$.
  2. A die is thrown till we see the number $6$ turn up five times (not necessarily in succession). Find the probability that the number $1$ is never seen.
  3. A coin is tossed till the first time when the number of heads (strictly) exceeds the number of tails. What is the probability that the number of tosses is at least $5$.
  4. (Extra exercise for fun! Do not submit this part) In the previous experiment, find the probability that the number of tosses is more than $N$.

 

Problem 18
(*) A particular segment of the DNA in a woman is $ATTAGCGG$ and the corresponding segment in her husband is $CTAAGGCG$. Write the probability space for the same DNA segment in the future child of this man-woman pair. Assume that all possible combinations are equally likely, and ignore the possibility of mutation.

 

Problem 19
In a party there are $N$ unrelated people. Their birthdays are noted (ignore leap years and assume that a year has $365$ days). Find the probability of the event that no two of them have the same birthday. Get the numerical value for $N=20$ and $N=30$.

 

Problem 20
A deck of $52$ cards is shuffled well and $3$ cards are dealt. Find the probability of the event that all three cards are from distinct suits.

 

Problem 21
Place $r_{1}$ indistinguishable blue balls and $r_{2}$ indistinguishable red balls into $m$ bins uniformly at random. Find the probability of the event that the first bin contains balls of both colors.

 

Problem 22
A coin with probability $p$ of turning up $H$ (assume $0 < p < 1$) is tossed till we get a $TH$ or a $HT$ (i.e., two consecutive tosses must be different, eg., $TTH$ or $HHHT$). Find the probability of the event that at least $5$ tosses are required.

 

Problem 23
A drunken man return home and tries to open the lock of his house from a bunch of $n$ keys by trying them at random till the door opens. Consider two cases:
  1. He is so drunk that he may try the same key several times.
  2. He is moderately drunk and remembers which keys he has already tried.
In both cases, find the probability of the event that he needs $n$ or more attempts to open the door.

 

Problem 24
Let $\mathbf{x}=(0,1,1,1,0,1)$ and $\mathbf{y}=(1,1,0,1,0,1)$. A new $6$-tuple ${\bf z}$ is created at random by choosing each $z_{i}$ to be $x_{i}$ or $y_{i}$ with equal chance, for $1\le i\le 6$ (A toy model for how two DNA sequences can recombine to give a new one). Find the probability of the event that ${\bf z}$ is identical to $\mathbf{x}$.

 

Problem 25
From a group of $W$ women and $M$ men, a team of $L$ people is chosen at random (of course $L\le W+M$). Find the probability of the event that the teams consists of exactly $k$ women.

 

Problem 26
Place $r$ distinguishable balls in $m$ labelled bins in such a way that each bin contains at most one ball. All distinguishable arrangements are deemed equally likely (this is known as Fermi-Dirac statistics). Find the probability that the first bin is empty.

 

Problem 27
A box contains $2N$ coupons labelled $1,2,\ldots ,2N$. Draw $k$ coupons (assume $k\le N$) from the box one after another
  1. with replacement,
  2. without replacement.
Find the probability of the event that no even numbered coupon is in the sample.

 

Problem 28
In a class with 108 people, one student gets a joke by e-mail. He/she forwards it to one randomly chosen classmate. The recipient does the same - chooses a classmate at random (could be the sender too) and forwards it to him/her. The process goes on like this for $20$ steps and stops. What is the probability that the first person to get the mail does not get it again? What is the chance that no one gets the e-mail more than once?

 

Problem 29
(Feller, III.6.3) Find the probability that in five tossings a coin falls head at least three times in succession.

 

Problem 30
(*) (Feller, III.6.1) Ten pairs of shoes are in a closet. Four shoes are selected at random. Find the probability that there will be at least one pair among the fours shoes selected.

 

Problem 31
(*) Let $A_{1},A_{2},A_{3},\ldots$ be events in a probability space. Write the following events in terms of $A_{1},A_{2},\ldots$ using the usual set operations (union, intersection, complement).
  1. An infinite number of the events $A_{i}$ occur.
  2. All except finitely many of the events $A_{i}$ occur.
  3. Exactly $k$ of the events $A_{i}$ occur.

 

Problem 32
(Feller, I.8.1) Let $A_{1},\ldots ,A_{n}$ be events in a probability space $(\Omega,p)$ and let $0\le m\le n$. Let $B_{m}$ be the event that at least $m$ of the events $A_{1},\ldots ,A_{n}$ occur. Mathematically, $$ B_{m}=\bigcup_{ 1\le i_{1} < i_{2} < \ldots < i_{m}\le n} (A_{i_{1} }\cap A_{i_{2} }\cap \ldots \cap A_{i_{m} }). $$ Show that $$\begin{align*} \mathbf{P}(B_{m}) &= S_{m}-\binom{m}{1}S_{m+1}+\binom{m+1}{2}S_{m+2}-\binom{m+2}{3}S_{m+3}+\ldots \\ \end{align*}$$ where $ S_{k}=\sum\limits_{1\le i_{1} < i_{2} < \ldots < i_{k}\le n}\mathbf{P}(A_{i_{1} }\cap A_{i_{2} }\cap \ldots \cap A_{i_{k} }). $

 

Problem 33
(*) Recall the problem of matching two shuffled decks of cards, but with $n$ cards in each deck, so that $\Omega_{n}=S_{n}\times S_{n}$ and $p_{(\pi,{\sigma})}=\frac{1}{(n!)^{2} }$ for each $(\pi,{\sigma})\in \Omega$. Let $A_{m}$ be the event that there are exactly $m$ matches between the two decks Strictly speaking, we should write $A_{n,m}$, since the $A_{n,m}\subseteq \Omega_{n}$ but for ease of notation we omit the subscript $n$. Similarly, it would be appropriate to write $p_{n}$ and $\mathbf{P}_{n}$ for the probabilities, but again, we simplify the notation when there is no risk of confusion. .
  1. For fixed $m\ge 0$, show that $\mathbf{P}(A_{m})\rightarrow e^{-1}\frac{1}{m!}$ as $n\rightarrow \infty$.
  2. Assume that the approximations above are valid for $n=52$ and $m\le 10$. Find the probability that there are at least $10$ matches.
[ Remark: You may use the result of the previous problem to solve this one].

 

Problem 34
Place $r_{n}$ distinguishable balls in $n$ distinguishable urns. Let $A_{n}$ be the event that at least one urn is empty Similar to the previous comment, here it would be appropriate to write $\mathbf{P}_{n}(A_{n})$ as the probability spaces are changing, but we keep the notation simple and simply write $\mathbf{P}(A_{n})$. .
  1. If $r_{n}=n^{2}$, show that $\mathbf{P}(A_{n})\rightarrow 0$ as $n\rightarrow \infty$.
  2. If $r_{n}=Cn$ for some fixed constant $C$, show that $\mathbf{P}(A_{n})\rightarrow 1$ as $n\rightarrow \infty$.
  3. Can you find an increasing function $f(\cdot)$ such that if $r_{n}=f(n)$, then $\mathbf{P}(A_{n})$ does not converge to $0$ or $1$? [ Hint: First try $r_{n}=n^{\alpha}$ for some $\alpha$, not necessarily an integer].

 

Problem 35
A box contains $N$ coupons labelled $1,2,\ldots ,N$. Draw $m_{N}$ coupons at random, with replacement, from the box. Let $A_{N}$ be the event that every coupon from the box has appeared at least once in the sample.
  1. If $m_{N}=N^{2}$, show that $\mathbf{P}(A_{N})\rightarrow 1$ as $N\rightarrow \infty$.
  2. If $m_{N}=CN$ for some fixed constant $C$, show that $\mathbf{P}(A_{N})\rightarrow 0$ as $N\rightarrow \infty$
  3. Can you find an increasing function $f(\cdot)$ such that if $m_{N}=f(N)$, then $\mathbf{P}(A_{N})$ does not converge to $0$ or $1$? [ Hint: See if you can relate this problem to the previous one].

 

Problem 36
Let $A_{1},\ldots ,A_{n}$ be events in a common probability space. Let $B$ be the event that at least two of the $A_{i}$s occur. Prove that \[\begin{aligned} \mathbf{P}(B)= S_{2}-2S_{3}+3S_{4}-\ldots +(-1)^{m}(m-1)S_{m} \end{aligned}\] where $S_{k}=\sum_{1\le i_{1} < i_{2} < \ldots < i_{k}\le n}\mathbf{P}\{A_{i_{1} }\cap \ldots \cap A_{i_{k} }\}$ for $1\le k\le n$. Not for submission: More generally, you may show that the probability that at least $\ell$ of the $A_{i}$s occur is equal to $$\begin{align}\label{eq:inclexclforatleastell}\tag{1} S_{\ell}-\binom{\ell}{1}S_{\ell+1}+\binom{\ell+1}{2}S_{\ell+2}-\binom{\ell+2}{3}S_{\ell+3}+\ldots \end{align}$$

 

Problem 37
Continuing with the notation of the previous problem, assume the formula given in \eqref{eq:inclexclforatleastell} holds. If $C$ is the event that exactly $\ell$ of the events $A_{i}$s occur, then show that $$\begin{align}\label{ex:inclexclexactlyellmany}\tag{2} \mathbf{P}(C)=S_{\ell}-\binom{\ell+1}{\ell}S_{\ell+1}+\binom{\ell+2}{\ell}S_{\ell+2}-\ldots +(-1)^{n-\ell}S_{n}. \end{align}$$ [ Hint: If you want to prove this directly, without using \eqref{eq:inclexclforatleastell}, that is also okay.]

 

Problem 38
If $r$ distinguishable balls are placed at random into $m$ labelled bins, write an expression for the probability that each bin contains at least two balls.

 

Problem 39
A deck of cards is dealt to four players ($13$ cards each). Find the probability that at least one of the players has two or more aces.

 

Problem 40
Let $p$ be the probability that in a gathering of 2500 people, there is some day of the year that is not the birthday of anyone in the gathering. Make reasonable assumptions and argue that $0.3\le p\le 0.4$.

 

Problem 41
Consider the problem of a psychic guessing the order of a deck of shuffled cards. Assume complete randomness of the guesses. Use the formula in \eqref{ex:inclexclexactlyellmany} to derive an expression for the probability that the number of guesses is exactly $\ell$, for $0\le \ell\le 52$. Use meaningful approximation to these probabilities and give numerical values (to 3 decimal places) of the probabilities for $\ell=0,1,2\ldots ,6$.

 

Problem 42
Place $r_{m}$ distinguishable balls in $m$ distinguishable bins. Let $A_{m}$ be the event that at least one bin is empty Here it would be appropriate to write $\mathbf{P}_{m}(A_{n})$ as the probability spaces are changing, but we keep the notation simple and simply write $\mathbf{P}(A_{n})$. .
  1. If $r_{n}=m^{2}$, show that $\mathbf{P}(A_{m})\rightarrow 0$ as $m\rightarrow \infty$.
  2. If $r_{m}=Cm$ for some fixed constant $C$, show that $\mathbf{P}(A_{m})\rightarrow 1$ as $n\rightarrow \infty$.
  3. Can you find an increasing function $f(\cdot)$ such that if $r_{m}=f(m)$, then $\mathbf{P}(A_{m})$ does not converge to $0$ or $1$? [ Hint: First try $r_{m}=m^{\alpha}$ for some $\alpha$, not necessarily an integer].

 

Problem 43
(*) A random experiment is described and a random variable observed. In each case, write the probability space, the random variable and the pmf of the random variable.
  1. Two fair dice are thrown. The sum of the two top faces is noted.
  2. Deal thirteen cards from a shuffled deck and count (a) the number of red cards (i.e., diamonds or hearts), (b) the number of kings, (c) the number of diamonds.

 

Problem 44
(*) Place $r$ distinguishable balls in $m$ distinguishable bins at random. Count the number of balls in the first bin.
  1. Write the probability space and the random variable described above.
  2. Find the probability mass function of the number of balls in the first bin.
  3. Find the expected value of the number of balls in the first bin.

 

Problem 45
(*) Find $\mathbf{E}[X]$ and $\mathbf{E}[X^{2}]$ for the following random variables.
  1. $X\sim \mbox{Geo}(p)$.
  2. $X\sim \mbox{Hypergeo}(N_{1},N_{2},m)$.

 

Problem 46
Let $X$ be a non-negative integer-valued random variable with CDF $F(\cdot)$. Show that $\mathbf{E}[X]=\sum_{k=0}^{\infty}(1-F(k))$.

 

Problem 47
A coin has probability $p$ of falling head. Fix an integer $m\ge 1$. Toss the coin till the $m^{\mbox{{th} }}$ head occurs. Let $X$ be the number of tosses required.
  1. Show that $X$ has pmf $$ f(k) = \binom{k-1}{m-1}p^{m}(1-p)^{k-m}, \hspace{4mm} k=m,m+1,m+2,\ldots. $$
  2. Find $\mathbf{E}[X]$ and $\mathbf{E}[X^{2}]$.
[ Note: When $m=1$, you should get the Geometric distribution with parameter $p$. We say that $X$ has negative-binomial distribution. Some books define $Y:=X-m$ (the number of tails till you get $m$ heads) to be a negative binomial random variable. Then, $Y$ takes values $0,1,2,\ldots$.]

 

Problem 48
For a pmf $f(\cdot)$, the mode is defined as any point at which $f$ attains its maximal value (i.e., $t$ is a mode if $f(t)\ge f(s)$ for any $s$). For each of the following distributions, find the mode(s) of the distribution and the value of the pmf at the modes.
  1. $\mbox{Bin}(n,p)$.
  2. $\mbox{Pois}(\lambda)$.
  3. $\mbox{Geo}(p)$.

 

Problem 49
Use MATLAB for the following exercise.
  1. Plot the pmf of Binomial, Poisson and Geometric distributions for various values of the parameters. Observe the plots to say where the maximum is attained, how the shape changes with changes in parameter, etc. [For definiteness
  2. Simulate random numbers (number of samples can be $50$ or $100$ etc) from the same distributions and plot their histograms. Visually compare the histograms with the plots of the pmf.
  3. Consider the ''real-life'' data given in Feller's book (chapter 6) and plot their histograms. Compare with the plot of the pmf for the appropriate distribution with appropriate choice of parameters.

 

Problem 50
Let $A,B$ be events with positive probability in a common probability space. We have seen in class that $\mathbf{P}(A\big| B)$ and $\mathbf{P}(B\big| A)$ are not to be confused.
  1. Show that $\mathbf{P}(A\big| B)=\mathbf{P}(B\big| A)$ if and only if $\mathbf{P}(A)=\mathbf{P}(B)$.
  2. Show that $\mathbf{P}(A\big| B) > \mathbf{P}(A)$ if and only if $\mathbf{P}(B\big| A) > \mathbf{P}(B)$. That is, if occurrence of $B$ makes $A$ more likely than it was before, then the occurrence of $A$ makes $B$ more likely than it was.

 

Problem 51
There are $10$ bins and the $k$th bin contains $k$ black and $11-k$ white balls. A bin is chosen uniformly at random. Then a ball is chosen uniformly at random from the chosen bin.
  1. Find the conditional probability that the chosen ball is black, given that the $k$th bin was chosen. Use this to compute the (unconditional) probability that the chosen ball is white.
  2. Given that the chosen ball is black, what is the probability that the $k$th bin was chosen?

 

Problem 52
A fair die is thrown $n$ times. For $1\le k\le n-1$, let $A_{k}$ be the event that the $k$th throw and the $(k+1)$st throw yield the same result. Are $A_{1},\ldots ,A_{n-1}$ independent? Are they pairwise independent?

 

Problem 53
Suppose $r$ distinguishable balls are placed in $m$ labelled bins at random. Each ball has probability $p_{k}$ of going into the $k$th bin, where $p_{1}+\ldots +p_{m}=1$. Let $X_{k}$ be the number of balls that go into the $k$th bin.
  1. Find the pmf of $X_{1}$.
  2. Find the pmf of the random variable $X_{1}+X_{2}$.

 

Problem 54
Two fair dice are thrown and let $X$ be the total of the two numbers that show up. Find the pmf of $X$. What is the most likely value of $X$?

 

Problem 55
Two dice (not necessarily identical, and not necessarily fair) are thrown and let $X$ be the total of the two numbers that turn up. Can you design the two dice so that $X$ is equally likely to be any of the numbers $2,3,\ldots ,12$?

 

Problem 56
A coin has probability $p$ of falling head. Assume $0 < p < 1$ and fix an integer $m\ge 1$. Toss the coin till the $m^{\mbox{{th} }}$ head occurs. Let $X$ be the number of tosses required. Show that $X$ has pmf $$ f(k) = \binom{k-1}{m-1}p^{m}(1-p)^{k-m}, \hspace{4mm} k=m,m+1,m+2,\ldots. $$ Find the CDF of $X$. [ Note: When $m=1$, this is the Geometric distribution with parameter $p$. We say that $X$ has negative-binomial distribution. Some books define $Y:=X-m$ (the number of tails till you get $m$ heads) to be a negative binomial random variable. Then, $Y$ takes values $0,1,2,\ldots$.]

 

Problem 57
A box contains $n$ coupons with one number on each coupon. We do not know the numbers but we know that they are distinct. Coupons are drawn one after another from the box, without replacement (i.e., after choosing a coupon at random, it is not put back into the box before drawing the next coupon). If the $k$th number drawn is larger than all the previous numbers, what is the chance that it is the largest of the $n$ numbers?

 

Problem 58
Let $X$ be a random variable with distribution (CDF) $F$ and density $f$.
  1. Find the distribution and density of the random variable $2X$.
  2. Find the distribution and density of the random variable $X+5$.
  3. Find the distribution and density of the random variable $-X$.
  4. Find the distribution and density of the random variable $1/X$.

 

Problem 59
Let $X$ be a random variable with $\mbox{Gamma}(\nu,\lambda)$ distribution. Let $F$ be the CDF of $X$. When $\nu$ is a positive integer, show that for $t\ge 0$, $$ F(t) = 1-e^{-\lambda t}\sum_{k=0}^{\nu-1}\frac{\lambda^{k}t^{k} }{k!}. $$ [ Note: Observe that this quantity is the same as $\mathbf{P}(N\ge \nu)$ where $N$ is a Poisson random variable with parameter $\lambda t$. There is a connection here but we cannot discuss it now].

 

Problem 60
For a pdf $f(\cdot)$, the mode is defined as any point at which $f$ attains its maximal value (i.e., $t$ is a mode if $f(t)\ge f(s)$ for any $s$). For each of the following distributions, find the mode(s) of the distribution and the value of the pmf at the modes.
  1. $N(\mu,{\sigma}^{2})$.
  2. $\mbox{Exp}(\lambda)$.
  3. $\mbox{Gamma}(\nu,1)$.

 

Problem 61
(*) Let $F$ be a CDF. For each $0 < q < 1$, the $q$-quantile(s) of $F$ is any number $t\in \mathbb{R}$ such that $F(s)\le q$ if $s < t$ and $F(s)\ge q$ if $s > t$.
  1. If $F$ is the CDF of $\mbox{Exp}(\lambda)$ distribution, find its $q$-quantile(s).
  2. If $F$ is the $N(0,1)$ distribution, use the normal tables to find the unique $q$-quantile for the following values of $q$: $0.01,0.1,0.25,0.5,0.75,0.9,0.99$.
  3. If $F$ is the $\mbox{Geo}(0.02)$ distribution, find a $q$-quantile for $q=0.01,0.25, 0.5,0.75,0.99$.

 

Problem 62
(*) Give explicit description of how you would simulate random variables from the following distributions.
  1. The standard Cauchy distribution with density $f(x)=\frac{1}{\pi(1+x^{2})}$ for $x\in \mathbb{R}$.
  2. The $\mbox{Beta}(1/2,1/2)$ distribution with density $\frac{1}{\pi\sqrt{x(1-x)} }$.
  3. (Do not need to submit this) Draw $100$ random numbers from either of these densities (on MATLAB or any other program that gives uniform random numbers) using the above procedure and draw the histograms. Compare the histograms to the plot of the densities.

 

Problem 63
In each of the following situations, the distribution of the random variable $X$ is given. Find the distributionof $Y$ (it is enough to find the density of $Y$).
  1. $X\sim \mbox{Unif}[0,1]$ and $Y=\sin^{-1}(X)$.
  2. $X\sim \mbox{Unif}[0,1]$ and $Y=\cos^{-1}(X)$.
  3. $X\sim N(0,1)$ and $Y=X^{2}$.
[ Note: We define $\sin^{-1}$ to take values in $[-\pi/2,\pi/2]$ and $\cos^{-1}$ to take values in $[0,\pi]$. In the third part, observe that $f(x)=x^{2}$ is not a one-one function, so the formula given in the notes does not apply directly].

For the next problem: If $X$ is a random variable with density $f(x)$, then its expected value is defined as $\mathbf{E}[X]=\int_{-\infty}^{+\infty} xf(x) dx$ (this integral is defined only if $\int_{-\infty}^{+\infty} |x|f(x) dx$ is finite). We shall study this notion in greater detail in class, but for now take it as an exercise in integration. More generally, if $T:\mathbb{R}\rightarrow \mathbb{R}$, the $\mathbf{E}[T(X)]=\int_{-\infty}^{+\infty} T(x)f(x) dx$.

 

Problem 64
Find $\mathbf{E}[X]$ and $\mathbf{E}[X^{2}]$ for the following cases.
  1. $X\sim N(\mu,{\sigma}^{2})$.
  2. $X\sim \mbox{Gamma}(\nu,\lambda)$. Note the answers for the particular case of $\mbox{Exp}(\lambda)$.
  3. $X\sim \mbox{Beta}(p,q)$. Note the answers for the particular case of $\mbox{Unif}[0,1]$.

 

Problem 65
What is the mode of the
  1. $\mbox{Pois}(\lambda)$ distribution?
  2. $\mbox{Hpergeometric}(M,W,K)$ distribution?
(Mode means the point(s) where the pmf (or pdf) attains its maximal value).

 

Problem 66
  1. Let $f(k)=\frac{1}{k(k+1)}$ for integer $k\ge 1$. Show that $f$ is a pmf and find the corresponding CDF.
  2. Let $\alpha > 0$ and set $F(x)=1-\frac{1}{x^{\alpha} }$ for $x\ge 1$ and $F(x)=0$ for $x < 1$. Show that $F$ is a CDF and find the corresponding density function. (This is known as the Pareto distribution).

 

Problem 67
Give explicit description of how you would simulate random variables from the following distributions.
  1. The standard Cauchy distribution with density $f(x)=\frac{1}{\pi(1+x^{2})}$ for $x\in \mathbb{R}$.
  2. The $\mbox{Beta}(1/2,1/2)$ distribution with density $\frac{1}{\pi\sqrt{x(1-x)} }$.
  3. (Do not need to submit this) Draw $100$ random numbers from either of these densities (on MATLAB or any other program that gives uniform random numbers) using the above procedure and draw the histograms. Compare the histograms to the plot of the densities.

 

Problem 68
Let $X$ be a random variable with distribution function $F$. Let $a > 0$ and $b\in \mathbb{R}$ and define $Y=aX+b$.
  1. What is the CDF of $Y$?
  2. If $X$ has a density $f$, find the density of $Y$.

 

Problem 69
  1. Let $X\sim \mbox{Exp}(\lambda)$. Fix $s,t > 0$ and compute the conditional probability of the event $X > t+s$ given that $X > s$.
  2. Let $\nu$ be a positive integer. Show that the CDF of $\mbox{Gamma}(\nu,\lambda)$ distribution is given by \[\begin{aligned} F(x)=1-e^{-\lambda x}\sum_{k=0}^{\nu-1}\frac{\lambda^{k} }{k!}x^{k}. \end{aligned}\]

 

Problem 70
Let $U\sim \mbox{Uniform}[0,1]$. Find the density and distribution functions of
  1. $U^{p}$ (where $p > 0$),
  2. $U/(1-U)$,
  3. $\log(1/U)$,
  4. $\frac{2}{\pi}\arcsin(U)$.

 

Problem 71
Let $X\sim N(0,1)$. Find the density of
  1. $aX+b$ (where $a,b\in \mathbb{R}$),
  2. $X^{2}$ ,
  3. $X^{3}$,
  4. $e^{X}$.

 

Problem 72
In a game, there are three closed boxes, in exactly one of which there is a prize. The player is asked to pick one of the three boxes. The organizer (who knows where the prize is), opens one of the other two boxes and shows that it is empty. Now the player has two choices, can stick to her first choice or to switch to the other closed closed. What should she do? This is known as the onty hall paradox. The word ''paradox'' is used to convey the strong feeling that many have that the probabilities of the two boxes are $1/2$ and $1/2$, since there is always one empty box out of the other two and it gives no information. To make the problem well-defined, one has to specify how the organizer chooses the empty box. If the player's first choice is empty, then exactly one of the other two boxes is empty and the organizer has no choice but to show that. If the player's first choice is correct, assume that the organizer (secretly) tosses a fair coin to choose which of the other two boxes to show. With this specification, show that the probability that the prize is in the original choice is $1/3$ (in other words, if you switch, the chances are higher, that is $2/3$).

 

Problem 73
If $F$ is a CDF, show that it can have at most countably many discontinuity points.

 

Problem 74
(*) Let $A,B$ be two events in a common probability space. Write the joint distributions (joint pmf) of the following random variables.
  1. $X={\mathbf 1}_{A}$ and $Y={\mathbf 1}_{B}$.
  2. $X={\mathbf 1}_{A\cap B}$ and $Y={\mathbf 1}_{A\cup B}$.

 

Problem 75
Let $a > 0,b > 0$ and $ab > c^{2}$. Let $(X,Y)$ have the bivariate normal distribution with density $$f(x,y)=\frac{\sqrt{ab-c^{2} }}{2\pi}e^{-\frac{1}{2}\left[a(x-\mu)^{2}+b(y-\nu)^{2}+2c(x-\mu)(y-\nu)\right]}.$$ Show that the marginal distributions are one-dimensional normal and find the parameters. For what values of the parameters are $X$ and $Y$ independent?

 

Problem 76
(*) Fix $r > 0$. Let $(X,Y)$ be a random vector with density $$\begin{align*} f(x,y)=\begin{cases} \frac{1}{\pi r^{2} } & \mbox{ if }x^{2}+y^{2}\le r^{2}, \ 0 & \mbox{ otherwise}. \end{cases} \end{align*}$$ This models the experiment of drawing a point at random from a disk of radius $r$ centered at $(0,0)$.
  1. Find the marginal densities of $X$ and $Y$ (i.e., find the density of $X$ and find the density of $Y$ separately).
  2. Can you solve the same problem if the point is drawn uniformly from the ellipse $\{(x,y){\; : \;} \frac{x^{2} }{a^{2} }+\frac{y^{2} }{b^{2} }\le 1\}$?

 

Problem 77
(*) Let ${\bf X}=(X_{1},\ldots ,X_{n-1})$ be a Multinomial random variable with parameters $r,n,p_{1},\ldots,p_{n}$ where $r,n$ are positive integers and $p_{i}$ are non-negative numbers that sum to $1$. This means that ${\bf X}$ has pmf $$ f(k_{1},\ldots ,k_{n-1})=\frac{n!}{k_{1}!k_{2}!\ldots k_{n-1}!(r-k_{1}-\ldots -k_{n-1})!}p_{1}^{k_{1} }\ldots p_{n-1}^{k_{n-1} }p_{n}^{r-k_{1}-\ldots -k_{n-1} } $$ $\mbox{ if }k_{i}\ge 0\mbox{ are integers that add to at most }r$.
  1. Let $m\le n$. Show that the distribution of $(X_{1},\ldots ,X_{m-1})$ is Multinomial with parameters $r,m,\tilde{p}_{1},\ldots ,\tilde{p}_{m}$ where $\tilde{p}_{i}=p_{i}$ for $i\le m-1$ and $\tilde{p}_{m}=p_{m}+\ldots +p_{n}$.
  2. The distribution of $X_{k}$ is $\mbox{Bin}(r,p_{k})$.
  3. (Do not need to submit this) Let $k_{1} < k_{2} < \ldots < k_{m}=n$. Define $Y_{1}=X_{1}+\ldots +X_{k_{1}-1}$, $Y_{2}=X_{k_{1} }+\ldots +X_{k_{2}-1}$, ... $Y_{m}=X_{k_{m-1} }+\ldots +X_{k_{m}-1}$. What is the distribution of $(Y_{1},\ldots ,Y_{m})$?
[ Note Remember the balls-in-bins interpretation of Multinomial. Based on it, try to guess the answers before you start calculating anything!].

 

Problem 78
Let $r$ balls be placed in $m$ bins at random. Let $X_{k}$ be the number of balls in the $k^{\mbox{th} }$ bin. Recall that $(X_{1},\ldots ,X_{m})$ has a multinomial distribution. Find the joint distribution of $(X_{1},X_{2})$ and the marginal distribution of $X_{1}$ and of $X_{2}$.

 

Problem 79
(*) [ Submit only parts (1) and (2)]
  1. Let $X$ and $Y$ be independent integer-valued random variables with pmf $f$ and $g$ respectively. That is, $\mathbf{P}\{X=k\}=f(k)$ and $\mathbf{P}\{Y=k\}g(k)$ for every $k\in \mathbb{Z}$. Then, show that $X+Y$ has the pmf $h$ given by $h(k)=\sum_{n\in \mathbb{Z}}f(n)g(k-n)$ for each $k\in \mathbb{Z}$.
  2. Let $X\sim \mbox{Pois}(\lambda)$ and $Y\sim\mbox{Pois}(\mu)$ and assume that $X$ and $Y$ are independent. Show that $X+Y\sim \mbox{Pois}(\lambda+\mu)$.
  3. Let $X\sim \mbox{Bin}(n,p)$ and $Y\sim\mbox{Bin}(m,p)$ and assume that $X$ and $Y$ are independent. Show that $X+Y\sim \mbox{Bin}(n+m,p)$.
  4. Let $X\sim \mbox{Geo}(p)$ and $Y\sim \mbox{Geo}(p)$ and assume that $X$ and $Y$ are independent. Show that $X+Y$ has negative binomial distribution and find the parameters.

 

Problem 80
(*) [ Submit only parts (1) and (2)]
  1. Let $X$ and $Y$ be independent random variables with densities $f(x)$ and $g(y)$ respectively. Use the change of variable formula to show that $X+Y$ has the density $h(u)$ given by $h(u)=\int_{-\infty}^{\infty}f(s)g(u-s)ds$.
  2. Let $X,Y$ be independent $\mbox{Unif}[-1,1]$ random variables. Find the density of $X+Y$.
  3. Let $X\sim \mbox{Gamma}(\mu,\lambda)$ and $Y\sim\mbox{Gamma}(\nu,\lambda)$ and assume that $X$ and $Y$ are independent. Show that $X+Y\sim \mbox{Gamma}(\mu+\nu,\lambda)$.
  4. Let $X\sim N(\mu_{1},{\sigma}_{1}^{2})$ and $Y\sim N(\mu_{2},{\sigma}_{2}^{2})$ and assume that $X$ and $Y$ are independent. Show that $X+Y\sim N(\mu_{1}+\mu_{2},{\sigma}_{1}^{2}+{\sigma}_{2}^{2})$.

 

Problem 81
Find examples of discrete probability spaces and events $A,B,C$ so that the following happen.
  1. The events $A,B,C$ are pairwise independent but not mutually independent.
  2. $\mathbf{P}(A\cap B\cap C)=\mathbf{P}(A)\mathbf{P}(B)\mathbf{P}(C)$ but $A,B,C$ are not independent.

 

Problem 82
(*) In each of the following cases, $X$ and $Y$ are independent random variables with the given distributions. You are asked to find the distribution of $X+Y$ using the convolution formula (when you encounter a named distribution, do identify it!).
  1. $X\sim \mbox{Gamma}(\nu,\lambda)$ and $Y\sim \mbox{Gamma}(\nu',\lambda)$.
  2. $X\sim N(\mu_{1},{\sigma}_{1}^{2})$ and $Y\sim N(\mu_{2},{\sigma}_{2}^{2})$.
  3. $X\sim \mbox{Pois}(\lambda)$ and $Y\sim \mbox{Pois}(\lambda')$.
  4. $X\sim \mbox{Geo}(p)$ and $Y\sim \mbox{Geo}(p)$.
[ Note: Submit 2, 3, 4 only]

 

Problem 83
Use the change of variable formula to solve the following problems.
  1. Let $X\sim \mbox{Pois}(\lambda)$ and $Y\sim \mbox{Pois}(\lambda')$ be independent. Let $Z=X+Y$. Show that the conditional distribution of $X$ given $Z=m$ is $\mbox{Bin}(m,\frac{\lambda}{\lambda+\lambda'})$.
  2. Let $X\sim \mbox{Gamma}(\nu,\lambda)$ and $Y\sim \mbox{Gamma}(\nu',\lambda)$ be independent. Show that $X/(X+Y)$ has a Beta distribution and find its parameters.
  3. Let $X\sim N(0,1)$ and $Y\sim N(0,1)$ be independent. Show that $X/Y$ has Cauchy distribution.

 

Problem 84
(*) Let $(X,Y)$ be a bivariate normal with density $$ \frac{\sqrt{ab-c^{2} }}{2\pi}e^{-\frac{1}{2}(a(x-\mu)^{2}+b(y-\nu)^{2}+2c(x-\mu)(y-\nu))} $$ where $a,b, ab-c^{2}$ are all positive and $\mu,\nu$ are any real numbers.
  1. Show that $\mathbf{E}[X]=\mu$, $\mathbf{E}[Y]=\nu$, $\mbox{Var}(X)={\sigma}_{1,1}$, $\mbox{Var}(Y)\sim {\sigma}_{2,2}$ and $\mbox{ Cov}(X,Y)={\sigma}_{1,2}$ where the matrix $\Sigma$ (called the covariance matrix of $(X,Y)$) is defined as $$ \Sigma=\left[\begin{array}{cc} {{\sigma}_{1,1} } & {{\sigma}_{1,2} } \ {{\sigma}_{2,1} } & {{\sigma}_{2,2} } \end{array} \right] := \left[\begin{array}{cc} a & c \ c & b \end{array} \right]^{-1}. $$
  2. Find the conditional density of $Y$ given $X$. When are $X$ and $Y$ independent? (''When'' means under what conditions on the parameters $a,b,c,\mu,\nu$ or in terms of ${\sigma}_{1,1},{\sigma}_{2,2},{\sigma}_{1,2},\mu,\nu$?).

 

Problem 85
In these problems, use change of variable formula in one dimension to show that the families of distributions we have defined
  1. If $X\sim \mbox{Exp}(\lambda)$, show that $\lambda X\sim \mbox{Exp}(1)$. More generally, if $X\sim \mbox{Gamma}(\nu,\lambda)$, show that $\lambda X\sim \mbox{Gamma}(\nu,1)$.
  2. If $X\sim N(\mu,{\sigma}^{2})$, show that $\frac{X-\mu}{{\sigma}}\sim N(0,1)$.

 

Problem 86
Let $X_{1},X_{2},X_{3}$ be independent random variables, each having $\mbox{Ber}_{\pm}(1/2)$ distribution. This means $\mathbf{P}(X_{1}=1)=\mathbf{P}(X_{1}=-1)=\frac{1}{2}$.
  1. Let $Y_{1}=X_{2}X_{3}$, $Y_{2}=X_{1}X_{3}$ and $Y_{3}=X_{1}X_{2}$. Show that $Y_{1},Y_{2},Y_{3}$ are pairwise independent (i.e., any two of them are independent) but are not independent.
  2. Can you find three events $A,B,C$ in some probability space such that they are pair-wise independent but not independent?

 

Problem 87
  1. Let $A_{1},\ldots ,A_{n}$ be independent with $\mathbf{P}(A_{i})=p$ for all $i$. Find the probability that (a) none of the events $A_{1},\ldots ,A_{n}$ occur, (b) all of the events $A_{1},\ldots ,A_{n}$ occur.
  2. Let $X_{1},X_{2}$ be independent random variables, both having $\mbox{Exp}(\lambda)$ distribution. Let $Z=\min\{X_{1},X_{2}\}$. Show that $Z\sim \mbox{Exp}(2\lambda)$. What if we take the minimum of $n$ independent exponential random variables?

 

Problem 88
Let $A,B$ be two events in a common probability space. Write the joint distributions (joint pmf) of the following random variables.
  1. $X={\mathbf 1}_{A}$ and $Y={\mathbf 1}_{B}$.
  2. $X={\mathbf 1}_{A\cap B}$ and $Y={\mathbf 1}_{A\cup B}$.

 

Problem 89
  1. Let $X\sim \mbox{Exp}(\lambda)$. For any $t,s > 0$, show that $\mathbf{P}\{X > t+s\ \pmb{\big|} \ X > t\}=\mathbf{P}\{X > s\}$. (This is called the memoryless property of the exponential distribution).
  2. Show that if a non-negative random variable $Y$ has memoryless property (i.e., $\mathbf{P}\{Y > t+s\ \pmb{\big|} \ Y > t\}=\mathbf{P}\{Y > s\}$ for all $s,t > 0$), then $Y$ must have exponential distribution.

 

Problem 90
  1. Let $X$ and $Y$ be independent integer-valued random variables with pmf $f$ and $g$ respectively. That is, $\mathbf{P}\{X=k\}=f(k)$ and $\mathbf{P}\{Y=k\}g(k)$ for every $k\in \mathbb{Z}$. Then, show that $X+Y$ has the pmf $h$ given by $h(k)=\sum_{n\in \mathbb{Z}}f(n)g(k-n)$ for each $k\in \mathbb{Z}$.
  2. Let $X\sim \mbox{Pois}(\lambda)$ and $Y\sim\mbox{Pois}(\mu)$ and assume that $X$ and $Y$ are independent. Show that $X+Y\sim \mbox{Pois}(\lambda+\mu)$.

 

Problem 91
Continuation of the previous problem.
  1. Let $X\sim \mbox{Bin}(n,p)$ and $Y\sim\mbox{Bin}(m,p)$ and assume that $X$ and $Y$ are independent. Show that $X+Y\sim \mbox{Bin}(n+m,p)$.
  2. Let $X\sim \mbox{Geo}(p)$ and $Y\sim \mbox{Geo}(p)$ and assume that $X$ and $Y$ are independent. Show that $X+Y$ has negative binomial distribution and find the parameters.

 

Problem 92
  1. Let $X$ and $Y$ be independent random variables with densities $f(x)$ and $g(y)$ respectively. Use the change of variable formula to show that $X+Y$ has the density $h(u)$ given by $h(u)=\int_{-\infty}^{\infty}f(s)g(u-s)ds$.
  2. Let $X,Y$ be independent $\mbox{Unif}[-1,1]$ random variables. Find the density of $X+Y$.

 

Problem 93
Continuation of the previous problem.
  1. Let $X\sim \mbox{Gamma}(\mu,\lambda)$ and $Y\sim\mbox{Gamma}(\nu,\lambda)$ and assume that $X$ and $Y$ are independent. Show that $X+Y\sim \mbox{Gamma}(\mu+\nu,\lambda)$.
  2. Let $X\sim N(\mu_{1},{\sigma}_{1}^{2})$ and $Y\sim N(\mu_{2},{\sigma}_{2}^{2})$ and assume that $X$ and $Y$ are independent. Show that $X+Y\sim N(\mu_{1}+\mu_{2},{\sigma}_{1}^{2}+{\sigma}_{2}^{2})$.

 

Problem 94
Let $(X,Y)$ have the bivariate normal distribution with density $$f(x,y)=\frac{\sqrt{ab-c^{2} }}{2\pi}e^{-\frac{1}{2}\left[ax^{2}+by^{2}+2cxy\right]}.$$ Assume that $a > 0,c > 0,ab-c^{2} > 0$ so that this is a valid density.
  1. Show that the marginal distributions are one-dimensional normal and find the parameters.
  2. For what values of the parameters are $X$ and $Y$ independent?

 

Problem 95
A few more exercises in change of variable formula.
  1. If $X,Y$ are independent $N(0,1)$ random variables, show that $X/Y$ has the Cauchy distribution (with density $\frac{1}{\pi(1+x^{2})}$).
  2. If $X\sim \mbox{Gamma}(\alpha,1)$, $Y\sim \mbox{Gamma}(\beta,1)$ are independent, then show that $X+Y$ and $X/(X+Y)$ are independent, $X+Y\sim \mbox{Gamma}(\alpha+\beta,1)$ and $X/(X+Y)\sim \mbox{Beta}(\alpha,\beta)$.
  3. If $X,Y$ are independent $N(0,1)$ random variables, show that $X^{2}+Y^{2}$ has $\mbox{Exp}(1/2)$ distribution.

 

Problem 96
Find the means and variances of $X$ in each of the following cases. $$ (a) X\sim \mbox{Bin}(n,p). \qquad (b) X\sim \mbox{Pois}(\lambda). \qquad (c) X\sim \mbox{Geo}(p). $$

 

Problem 97
Find the means and variances of $X$ in each of the following cases. $$ (a) X\sim \mbox{N}(\mu,{\sigma}^{2}). \qquad (b) X\sim \mbox{Gamma}(\nu,\lambda). \qquad (c) X\sim \mbox{Beta}(\nu_{1},\nu_{2}). \qquad (d) X\sim \mbox{Unif}[a,b]. $$

 

Problem 98
  1. Let $\xi\sim \mbox{Exp}(\lambda)$. For any $t,s\ge 0$, show that $\mathbf{P}\{\xi > t+s\ \pmb{\big|} \ \xi > t\}=\mathbf{P}\{\xi > s\}$. (This is called the memoryless property of the exponential distribution).
  2. Show that if a non-negative random variable $\xi$ has memoryless property (i.e., $\mathbf{P}\{\xi > t+s\ \pmb{\big|} \ \xi > t\}=\mathbf{P}\{\xi > s\}$), then $\xi$ must have exponential distribution.

 

Problem 99
Let $X$ be a non-negative random variable with CDF $F(t)$.
  1. Show that $\mathbf{E}[X]=\int\limits_{0}^{\infty}(1-F(t)) dt$ and more generally $\mathbf{E}[X^{p}]=\int\limits_{0}^{\infty}pt^{p-1}(1-F(t))dt$.
  2. If $X$ is non-negative integer valued, then $\mathbf{E}[X]=\sum_{k=1}^{\infty}\mathbf{P}\{X\ge k\}$.

 

Problem 100
(*) Find all possible joint distributions of $(X,Y)$ such that $X\sim\mbox{Ber}(1/2)$ and $Y\sim \mbox{Ber}(1/2)$. Find the correlation for each such joint distribution.

 

Problem 101
Let $(X,Y)$ have the bivariate normal distribution with density $$f(x,y)=\frac{\sqrt{ab-c^{2} }}{2\pi}e^{-\frac{1}{2}\left[a(x-\mu)^{2}+b(y-\nu)^{2}+2c(x-\mu)(y-\nu)\right]}.$$
  1. Find the marginal distributions of $X$ and of $Y$.
  2. Find means and variances of $X$ and $Y$ and the covariance and correlation of $X$ with $Y$. Under what conditions on the parameters are $X$ and $Y$ independent?
[ Note: It is very useful to introduce the matrix $\Sigma=\left[\begin{array}{cc} a & c \ c & b \end{array} \right]=\left[\begin{array}{cc} {\frac{b}{\Delta} } & {-\frac{c}{\Delta} } \ {-\frac{c}{\Delta} } & {\frac{a}{\Delta} } \end{array} \right]$ which is called the covariance matrix of $(X,Y)$. The answers can be written in terms of the entries of $\Sigma$.]

 

Problem 102
Let $r$ balls be placed in $m$ bins at random. Let $X_{k}$ be the number of balls in the $k^{\mbox{th} }$ bin. Recall that $(X_{1},\ldots ,X_{m})$ has a multinomial distribution.
  1. Find the joint distribution of $(X_{1},X_{2})$ and the marginal distribution of $X_{1}$ and of $X_{2}$.
  2. Find the means, variances, covariance and correlation of $X_{1}$ and $X_{2}$.
  3. Let $Y$ be the number of empty bins. Find the mean and variance of $Y$. [ Hint: Write $Y$ as ${\mathbf 1}_{A_{1} }+\ldots +{\mathbf 1}_{A_{m} }$ where $A_{k}$ is the event that the $k^{\mbox{th} }$ bin is empty].

 

Problem 103
(*) A box contains $N$ coupons where the number $w_{k}$ is written on the $k^{\mbox{th} }$ coupon. Let $\mu=\frac{1}{N}\sum_{k=1}^{N}w_{k}$ be the ''population mean'' and let ${\sigma}^{2}=\frac{1}{n}\sum_{i=1}^{n}(w_{i}-\mu)^{2}$ be the ''population variance''. A sample of size $m$ is drawn from the population, the values seen are $X_{1},\ldots ,X_{m}$. The sample mean $\bar{X}_{m}=(X_{1}+\ldots +X_{m})/m$ is formed. Find the mean and variance of $\bar{X}_{m}$ in the following two cases.
  1. The samples are drawn with replacement (i.e., draw a coupon, note the number, put the coupon back in the box, and draw again ...).
  2. The samples are drawn without replacement.

 

Problem 104
Find the expectation and variance for a random variable with the following distributions.
  1. (a) $\mbox{Bin}(n,p)$, (b) $\mbox{Geo}(p)$, (c) $\mbox{Pois}(\lambda)$, (d) $\mbox{Hypergeo}(N_{1},N_{2},m)$.
  2. (a) $N(\mu,{\sigma}^{2})$, (b) $\mbox{Gamma}(\nu,\lambda)$, (c) $\mbox{Beta}(p,q)$.
[ Note: Although the computations are easy, the answers you get are worth remembering as they occur in various situations.]

 

Problem 105
(*) Place $r$ balls in $n$ bins uniformly at random. Let $X_{k}$ be the number of balls in the $k^{\mbox{th} }$ bin. Find $\mathbf{E}[X_{k}]$, $\mbox{Var}(X_{k})$ and $\mbox{ Cov}(X_{k},X_{\ell})$ for $1\le k,\ell\le n$. [ Hint: First do the case when $r=1$. Then think how to use that to get the general case].

 

Problem 106
Suppose $X,Y,Z$ are i.i.d. random variables with each having marginal density $f(t)$.
  1. Find $\mathbf{E}\left[\frac{X}{X+Y+Z}\right]$ (assume that it exists).
  2. Find $\mathbf{P}(X < Y > Z)$.

 

Problem 107
Consider the following integrals $$ \int\limits_{0}^{1}\frac{4}{1+x^{2} }dx = \pi, \qquad \int\limits_{0}^{1}\frac{1}{\sqrt{x(1-x)} }dx=\pi. $$ In either case, use Monte-Carlo integration with $100$, $1000$ and $10000$ samples from uniform distribution to find approximations of $\pi$. Compare the approximations to the true value $3.1416\ldots$.

 

Problem 108
Recall the coupon collector problem. A box contains $n$ coupons labelled $1,2,\ldots ,n$. Coupons are drawn at random from the box, repeatedly and with replacement. Let $T_{n}$ be the number of draws needed till each of the coupons has appeared at least once.
  1. Show that $\mathbf{E}[T_{n}]\sim n\log n$ (this just means $\frac{1}{n\log n}\mathbf{E}[T_{n}]\rightarrow 1$).
  2. Show that $\mbox{Var}(T_{n})\le 2n^{2}$.
  3. Show that $\mathbf{P}\left( \left.\vphantom{\hbox{\Large (}}\right| \frac{T_{n} }{n\log n}-1\left.\vphantom{\hbox{\Large (}}\right| > \delta\right) \rightarrow 0$ for any $\delta > 0$.
[ Hint: Consider the number of draws needed to get the first new coupon, the further number of draws needed to get the next coupon and so on].

 

Problem 109
(**) Recall the coupon collector problem where coupons are drawn repeatedly (with replacement) from a box containing coupons labelled $1,2,\ldots ,N$. Let $T_{N}$ be the number of draws made till all the coupons are seen.
  1. Find $\mathbf{E}[T_{N}]$ and $\mbox{Var}(T_{N})$.
  2. Use Chebyshev's inequality to show that for any $\delta > 0$, as $N\rightarrow \infty$ we have $$\mathbf{P}\{(1-\delta)N\log N\le T_{N}\le (1+\delta)N\log N\}\rightarrow 1.$$

 

Problem 110
(*) Recall the problem of a psychic guessing cards. Consider a shuffled deck of $n$ cards and a psychic is supposed to guess the order of cards. Let $M_{n}$ be the number of correct guesses.
  1. Assuming random guessing by the psychic, show that $\mathbf{E}[M_{n}]=1$ and $\mbox{Var}(M_{n})=1$. [ Hint Write $M_{n}$ as $X_{1}+\ldots +X_{n}$ where $X_{k}$ is the indicator of the event that the $k^{\mbox{th} }$ card is guessed correctly].
  2. Consider a variant of the game where the cards are dealt one by one and before each card is dealt, the psychic guesses what card it is going to be. In this case find $\mathbf{E}[M_{n}]$ and $\mbox{Var}(M_{n})$.

 

Problem 111
(**) Let $X$ be a random variable. Let $f(a)=\mathbf{E}[|X-a|]$ (makes sense if the first moment exists) and $g(a)=\mathbf{E}[(X-a)^{2}]$ (makes sense if the second moment exists).
  1. Show that $f$ is minimized uniquely at $a=\mathbf{E}[X]$.
  2. Show that the minimizers of $g$ are precisely the medians of $X$ (recall that a number $b$ is a median of $X$ if $\mathbf{P}\{X\ge t\}\ge \frac{1}{2}$ and $\mathbf{P}\{X\le t\}\ge \frac{1}{2}$).

 

Problem 112
(**) Let $X$ be a non-negative random variable. Read the discussion following the problem to understand the significance of this problem.
  1. Suppose $X_{n}$ takes the values $n^{2}$ and $0$ with probabilities $1/n$ and $1-(1/n)$, respectively. Compare $\mathbf{P}\{X_{n} > 0\}$ and $\mathbf{E}[X_{n}]$ for large $n$.
  2. Show the second moment inequality (aka Paley-Zygmund inequality): $\mathbf{P}\{X > 0\}\ge (\mathbf{E}[X])^{2}/\mathbf{E}[X^{2}]$.
[ Discussion: Markov's inequality tells us that that the tail probability $\mathbf{P}\{X\ge t\}$ can be bounded from above using $\mathbf{E}[X]$. In particular, $\mathbf{P}\{X\ge r\mathbf{E}[X]\}\le \frac{1}{r}$. A natural question is whether there is a lower bound for the tail probability in terms of the expected value. In other words, if the mean is large, must the random variable be large with significant probability? The first part shows that the answer is `No' in general. The second part shows that the answer is `Yes', provided we have control on the second moment $\mathbf{E}[X^{2}]$ from above. Notice why the inequality does not give any useful bound in the first part of the problem (what happens to the second moment of $X_{n}$?)]

Hoeffding's inequality : Using Chebyshev's inequality we got a bound of ${\sigma}^{2}/nt^{2}$ for the probability that the sample mean deviates from the population mean by more than $t$. This is very general. If we make more assumptions about our random variable, we can give better bounds. The following exercise is to illustrate this.

 

Problem 113
(Optional!) Let $X_{1},\ldots ,X_{n}$ be i.i.d. $\mbox{Ber}_{\pm}(1/2)$. That is, $\mathbf{P}\{X_{k}=+1\}=\mathbf{P}\{X_{k}=-1\}=\frac{1}{2}$. Let $\bar{X}_{n}=\frac{1}{n}(X_{1}+\ldots +X_{n})$. Show that $$ \mathbf{P}\{|\bar{X}_{n}| > t\}\le 2e^{-nt^{2}/2} $$ by following these steps.
  1. Show that $\mathbf{P}\{\bar{X}_{n} > t\}\le e^{-\theta t}\left(\frac{e^{\theta/n }+e^{-\theta/n} }{2}\right)^{n}$ for any $\theta > 0$.
  2. Prove the inequality $e^{x}+e^{-x}\le 2 e^{x^{2}/2}$ for any $x > 0$.
  3. Use the first two parts to show that $\mathbf{P}\{\bar{X}_{n} > t\}\le e^{-nt^{2}/2}$ (you must make an appropriate choice of $\theta$ depending on $t$).
  4. Now consider $|\bar{X}_{n}|$ and break $\mathbf{P}\{|\bar{X}_{n}| > t\}$ into two summands to get the desired inequality.
[ Note: Here $\mu=0$ and ${\sigma}^{2}=1$, and hence Chebyshev's inequality only gives the bound $ \mathbf{P}\{|\bar{X}_{n}| > t\}\le \frac{1}{nt^{2} }$. Do you see that Hoeffding's inequality is better?]

 

Problem 114
Find the expectation and variance for a random variable with the following distributions.
  1. $\mbox{Bin}(n,p)$,
  2. $\mbox{Geo}(p)$,
  3. $\mbox{Pois}(\lambda)$,
  4. $\mbox{Hypergeo}(N_{1},N_{2},m)$.
[ Note: Although the computations are easy, the answers you get are worth remembering as they occur in various situations.]

 

Problem 115
Find the expectation and variance for a random variable with the following distributions.
  1. $N(\mu,{\sigma}^{2})$,
  2. $\mbox{Gamma}(\nu,\lambda)$,
  3. $\mbox{Beta}(p,q)$.
[ Note: Although the computations are easy, the answers you get are worth remembering as they occur in various situations.]

 

Problem 116
Place $r$ balls in $n$ bins uniformly at random. Let $X_{k}$ be the number of balls in the $k^{\mbox{th} }$ bin. Find $\mathbf{E}[X_{k}]$, $\mbox{Var}(X_{k})$ and $\mbox{ Cov}(X_{k},X_{\ell})$ for $1\le k,\ell\le n$. [ Hint: First do the case when $r=1$. Then think how to use that to get the general case].

 

Problem 117
Let $X$ be a non-negative random variable with CDF $F(t)$.
  1. Show that $\mathbf{E}[X]=\int_{0}^{\infty}(1-F(t)) dt$ and more generally $\mathbf{E}[X^{p}]=\int_{0}^{\infty}pt^{p-1}(1-F(t))dt$. [ Hint: In showing this, you may assume that $X$ has a density if you like, but it is not necessary for the above formulas to hold true]
  2. If $X$ is non-negative integer valued, then $\mathbf{E}[X]=\sum_{k=1}^{\infty}\mathbf{P}\{X\ge k\}$.

 

Problem 118
A deck consists of cards labelled $1,2,\ldots ,N$. The deck is shuffled well. Let $X$ be the label on the first card and let $Y$ be the label on the second card. Find the means and variances of $X$ and $Y$ and the covariance of $X$ and $Y$.

 

Problem 119
A box contains $N$ coupons labelled $1,2,\ldots ,N$. A sample of size $m$ is drawn from the population and the sample average $\bar{X}_{m}$ is computed. Find the mean and standard deviation of $\bar{X}_{m}$ in both the following cases.
  1. The $m$ coupons are drawn with replacement.
  2. The $m$ coupons are drawn without replacement (in this case, assume $m\le N$).

 

Problem 120
Let $X\sim N(0,1)$. Although it is not possible to get an exact expression for the CDF of $X$, show that for any $t > 0$, \[\begin{aligned} \mathbf{P}\{X\ge t\}\le \frac{1}{\sqrt{2\pi} }\frac{e^{-t^{2}/2} }{t} \end{aligned}\] which shows that the tail of the CDF decays rapidly. [ Hint: Use the idea used in the proof of Markov's inequality]

The following problem is for the more mathematically minded students. You may safely skip this.

 

Problem 121
The coupon collector problem. A box contains $n$ coupons labelled $1,2,\ldots ,n$. Coupons are drawn at random from the box, repeatedly and with replacement. Let $T_{n}$ be the number of draws needed till each of the coupons has appeared at least once.
  1. Show that $\mathbf{E}[T_{n}]\sim n\log n$ (this just means $\frac{1}{n\log n}\mathbf{E}[T_{n}]\rightarrow 1$).
  2. Show that $\mbox{Var}(T_{n})\le 2n^{2}$.
  3. Show that $\mathbf{P}\left( \left.\vphantom{\hbox{\Large (}}\right| \frac{T_{n} }{n\log n}-1\left.\vphantom{\hbox{\Large (}}\right| > \delta\right) \rightarrow 0$ for any $\delta > 0$.
[ Hint: Consider the number of draws needed to get the first new coupon, the further number of draws needed to get the next coupon and so on].

 

Problem 122
Let $X_{1},X_{2},\ldots$ be i.i.d. $\mbox{Uniform}[1,2]$ distribution. Let $S=X_{1}+\ldots +X_{100}$. Give approximate quantiles at levels $0.01$, $0.25$, $0.5$, $0.75$, $0.99$ for $S$. Use CLT and normal distribution tables.

 

Problem 123
Let $X_{1},\ldots ,X_{n}$ be i.i.d. samples from a parametric family of discrete distributions. In each of the following cases, find the MLE for the unknown parameter(s) and find the bias.
  1. $X_{i}$ are i.i.d. $\mbox{Ber}(p)$ where $p$ is unknown.
  2. $X_{i}$ are i.i.d. $N(\mu,{\sigma}^{2})$ where $\mu,{\sigma}^{2}$ are unknown.

 

Problem 124
Let $X_{1},\ldots ,X_{n}$ be i.i.d. samples from a parametric family of discrete distributions. In each of the following cases, find the MLE for the unknown parameter(s) and calculate the bias.
  1. $X_{i}$ are i.i.d. $\mbox{Geo}(p)$ where $p$ is unknown.
  2. $X_{i}$ are i.i.d. $\mbox{Unif}[a,b]$ where $a,b$ are unknown.

 

Problem 125
Let $(X_{1},Y_{1}),\ldots ,(X_{n},Y_{n})$ be i.i.d. samples from a bivariate distribution. Let $\tau=\mbox{Cov}(X_{1},Y_{1})$. Let $r_{n}=\frac{1}{n}\sum_{k=1}^{n}(X_{k}-\bar{X}_{n})(Y_{k}-\bar{X}_{n})$ be the sample covariance.
  1. Show that $r_{n}$ is a biased estimate for $\tau$ and find the bias.
  2. Modify the estimate $r_{n}$ to get an unbiased estimate of $\tau$.
[ Remark: It is often convenient, here and elsewhere, to realise that $\tau=\mathbf{E}[X_{1}Y_{1}]-\mathbf{E}[X_{1}]\mathbf{E}[Y_{1}]$ and $r_{n}=(\frac{1}{n}\sum_{k=1}^{n}X_{k}Y_{k})-\bar{X}_{n}\bar{Y}_{n}$.]

 

Problem 126
Let $X_{1},X_{2},\ldots ,X_{n}$ be i.i.d. random variables from a distribution $F$. Let $M_{n}$ be a median of $X_{1},\ldots ,X_{n}$. Assume that the distribution $F$ has a unique median, that is there is a unique number $m$ such that $F(m)=\frac12$. For any $\delta > 0$ show that $\mathbf{P}\{|M_{n}-m|\ge \delta\}\rightarrow 0$ as $n\rightarrow \infty$. [ Remark: The above statement justifies using the sample median to estimate the population median, in the sense that at least for large sample sizes, the two are close. Similar justification for using sample mean to estimate expected value came from the law of large numbers]

The following problem is only for those mathematically minded.

 

Problem 127
Let $X_{1},X_{2},\ldots $ be i.i.d. $\mbox{Pois}(\lambda)$ random variables. Work out the exact distribution of $X_{1}+\ldots +X_{n}$ and use it show the central limit theorem in this case. That is, show that for any $a < b$, \[\begin{aligned} \mathbf{P}\left\{a\le \frac{\sqrt{n}(\bar{X}_{n}-\lambda)}{\sqrt{\lambda} }\le b\right\}\longrightarrow \mathbf{P}\{a\le Z\le b\} \end{aligned}\] where $Z\sim N(0,1)$. [ Remark: This is analogous to the two cases of CLT that we showed in class, for exponential and for Bernoulli random variables].

The following problem shows that in certain situations, sums of random variables are approximately Poisson distributed. This gives a hint as to why Poisson distribution arises in many contexts. The question may be ignored safely from the exam point of view.

 

Problem 128
Let $X_{n,1},X_{n,2},\ldots X_{n,n}$ be i.i.d. $\mbox{Ber}(p_{n})$ random variables. Let $S_{n}=X_{n,1}+\ldots +X_{n,n}$. If $np_{n}\rightarrow \lambda$ (a finite positive number), show that $S_{n}$ has approximately $\mbox{Pois}(\lambda)$ distribution in the sense that for any $k\in \mathbb{N}$, \[\begin{aligned} \mathbf{P}\{S_{n}=k\} \rightarrow e^{-\lambda}\frac{\lambda^{k} }{k!}. \end{aligned}\] [ Remark: In contrast, if $np_{n}\rightarrow \infty$, deduce from CLT that $S_{n}$ has approximately a normal distribution, i.e., \[\begin{aligned} \mathbf{P}\left\{ a\le \frac{S_{n}-\mathbf{E}[S_{n}]}{\sqrt{\mbox{Var}(S_{n})} }\le b\right\} \rightarrow \mathbf{P}\{a\le Z\le b\} \end{aligned}\] for any $a < b$.]