Probability Models and Stastics

Cumulative distributions will also be referred to as simply distribution functions or distributions. We start by giving two large classes of CDFs. There are CDFs that do not belong to either of these classes, but for practical purposes they may be ignored (for now).

(CDFs with pmf). Let $f$ be a pmf, i.e., let $t_{1},t_{2},\ldots$ be a countable subset of reals and let $f(t_{i})$ be non-negative numbers such that $\sum_{i}f(t_{i})=1$. Then, define $F:\mathbb{R}\rightarrow \mathbb{R}$ by $$ F(t) := \sum_{i: t_{i}\le t}f(t_{i}). $$ Then, $F$ is a CDF. Indeed, we have seen that it is the CDF of a discrete random variable. A special feature of this CDF is that it increases only in jumps (in more precise language, if $F$ is continuous on an interval $[s,t]$, then $F(s)=F(t)$).
(CDFs with pdf). Let $f:\mathbb{R}\rightarrow\mathbb{R}_{+}$ be a function (convenient to assume that it is a piece-wise continuous function) such that $\int_{-\infty}^{+\infty}f(u)du=1$. Such a function is called a probability density function or pdf for short. Then, define $F:\mathbb{R}\rightarrow \mathbb{R}$ by \[\begin{aligned} F(t) :=\int_{-\infty}^{t}f(u) du. \end{aligned}\] Again, $F$ is a CDF. Indeed, it is clear that $F$ has the increasing property (if $t > s$, then $F(t)-F(s)=\int_{s}^{t}f(u)du$ which is non-negative because $f(u)$ is non-negative for all $u$), and its limits at $\pm \infty$ are as they should be (why?). As for right-continuity, $F$ is in-fact continuous. Actually $F$ is differentiable except at points where $f$ is discontinuous and $F'(t)=f(t)$.

Remark 92

We understand the pmf. For example if $X$ has pmf $f$, then $f(t_{i})$ is just the probability that $X$ takes the value $t_{i}$. How to interpret the pdf? If $X$ has pdf $f$, then as we already remarked, the CDF is continuous and hence $\mathbf{P}\{X=t\}=0$. Therefore $f(t)$ cannot be interpreted as $\mathbf{P}\{X=t\}$ (in fact, pdf can take values greater than $1$, so it cannot be a probability!).

To interpret $f(a)$, take a small positive number $\delta$ and look at $$ F(a+\delta)-F(a) = \int\limits_{a}^{a+\delta}f(u) du \approx \delta f(a). $$ In other words, $f(a)$ measures the chance of the random variable taking values near $a$. Higher the pdf, greater the chance of taking values near that point.

Among distributions with pmf, we have seen the Binomial, Poisson, Geometric and Hypergeometric families of distributions. Now we give many important examples of distributions (CDFs) with densities.

Example 93

Uniform distribution on the interval $[a,b]$ :, denoted Unif($[a,b]$) where $a < b$ is the distribution with density and distribution given by $$ \mbox{PDF:} f(t) = \begin{cases}\frac{1}{b-a} & \mbox{if } t\in(a,b) \ 0 & \mbox{otherwise} \end{cases}\qquad \mbox{CDF:} F(t) = \begin{cases}0 & \mbox{if } t\le a \ \frac{t-a}{b-a} & \mbox{if }t\in (a,b) \ 1 & \mbox{if }t\ge b.\end{cases} $$

Example 94

Exponential distribution with parameter $\lambda$ :, denoted Exp($\lambda$) where $\lambda > 0$ is the distribution with density and distribution given by $$ \mbox{PDF:} f(t) = \begin{cases}\lambda e^{-\lambda t}& \mbox{if } t > 0 \ 0 & \mbox{otherwise} \end{cases}\qquad \mbox{CDF:} F(t) = \begin{cases}0 & \mbox{if } t\le 0 \ 1-e^{-\lambda t} & \mbox{if }t > 0.\end{cases} $$

Example 95

Normal distribution with parameters $\mu,{\sigma}^{2}$ :, denoted N($\mu,{\sigma}^{2}$) where $\mu\in \mathbb{R}$ and ${\sigma}^{2} > 0$ is the distribution with density and distribution given by $$ \mbox{PDF:} \varphi_{\mu,{\sigma}^{2} }(t) = \frac{1}{\sqrt{2\pi} }e^{-\frac{1}{2{\sigma}^{2} }(t-\mu)^{2} }\qquad \mbox{CDF:} \Phi_{\mu,{\sigma}^{2} }(t) = \int\limits_{-\infty}^{t}\varphi_{\mu,{\sigma}^{2} }(u)du. $$ There is no closed form expression for the CDF. It is standard notation to write $\varphi$ and $\Phi$ to denote the normal density and CDF when $\mu=0$ and ${\sigma}^{2}=1$. N($0,1$) is called the standard normal distribution. By a change of variable one can check that $\Phi_{\mu,{\sigma}^{2} }(t)=\Phi(\frac{t-\mu}{{\sigma}})$.

We said that the normal CDF has no simple expression, but is it even clear that it is a CDF?! In other words, is the proposed density a true pdf? Clearly $\varphi(t)=\frac{1}{\sqrt{2\pi} }e^{-t^{2}/2}$ is non-negative. We need to check that its integral is $1$.

Lemma 96

Fix $\mu\in \mathbb{R}$ and ${\sigma} > 0$ and let $\varphi(t)=\frac{1}{\sqrt{2\pi} }e^{-\frac{1}{2{\sigma}^{2} }(t-\mu)^{2} }$. Then, $\int\limits_{-\infty}^{\infty}\varphi(t) dt =1$.

It suffices to check the case $\mu=0$ and ${\sigma}^{2}=1$ (why?). To find its integral is quite non-trivial. Let $I=\int_{-\infty}^{\infty} \varphi(t)dt$. We introduce the two-variable function $h(t,s):=\varphi(t)\varphi(s)=(2\pi)^{-1}e^{-(t^{2}+s^{2})/2}$. On the one hand, $$ \int_{-\infty}^{\infty}\int_{-\infty}^{\infty}h(t,s)dtds = \left(\int_{-\infty}^{+\infty}\varphi(t)dt\right) \left(\int_{-\infty}^{+\infty}\varphi(s)ds\right)=I^{2}. $$ On the other hand, using polar co-ordinates $t=r\cos\theta$, $s=r\sin \theta$, we see that $$ \int_{-\infty}^{\infty}\int_{-\infty}^{\infty}h(t,s)dtds =\int_{0}^{\infty}\int_{0}^{2\pi}(2\pi)^{-1}e^{-r^{2}/2}rd\theta dr = \int_{0}^{\infty}re^{-r^{2}/2}dr =1 $$ since $\frac{d}{dr}e^{-r^{2}/2}=-re^{-r^{2}/2}$. Thus $I^{2}=1$ and hence $I=1$.

Example 97

Gamma distribution with shape parameter $\nu$ and scaler parameter $\lambda$ :, where $\nu > 0$ and $\lambda > 0$, denoted Gamma($\nu,\lambda$) is the distribution with density and distribution given by - $$ \mbox{PDF:} f(t) = \begin{cases}\frac{1}{\Gamma(\nu)}\lambda^{\nu} t^{\nu-1}e^{-\lambda t}& \mbox{if } t > 0 \ 0 & \mbox{otherwise} \end{cases}\qquad \mbox{CDF:} F(t) = \begin{cases}0 & \mbox{if } t\le 0 \ \int_{0}^{t}f(u)du & \mbox{if }t > 0.\end{cases} $$ Here $\Gamma(\nu):=\int_{0}^{\infty}t^{\nu-1}e^{-t}dt$. Firstly, $f$ is a density, that is, that it integrates to $1$. To see this, make the change of variable $\lambda t=u$ to see that $$ \int_{0}^{\infty}\lambda^{\nu}e^{-\lambda t}t^{\nu-1}dt = \int_{0}^{\infty}e^{-u}u^{\nu-1}d\nu = \Gamma(\nu). $$ Thus, $\int_{0}^{\infty} f(t)dt=1$.

When $\nu=1$, we get back the exponential distribution. Thus, the Gamma family subsumes the exponential distributions. For positive integer values of $\nu$, one can actually write an expression for the CDF of Gamma($\nu,\lambda$) as (this is a homework problem) $$ F_{\nu,\lambda}(t)=1-e^{-\lambda t}\sum\limits_{k=0}^{\nu-1}\frac{(\lambda t)^{k} }{k!}. $$ Once the expression is given, it is easy to check it by induction (and integration by parts). A curious observation is that the right hand side is exactly $\mathbf{P}(N\ge \nu)$ where $N\sim \mbox{Pois}(\lambda t)$. This is in fact indicating a deep connection between Poisson distribution and the Gamma distributions. The function $\Gamma(\nu)$, also known as Euler's Gamma function, is an interesting and important one and occurs all over mathematics. ¹ The Gamma function: The function $\Gamma:(0,\infty)\rightarrow \mathbb{R}$ defined by $\Gamma(\nu)=\int_{0}^{\infty}e^{-t}t^{\nu-1}dt$ is a very important function that often occurs in mathematics and physics. There is no simpler expression for it, although one can find it explicitly for special values of $\nu$. One of its most important properties is that $\Gamma(\nu+1)=\nu\Gamma(\nu)$. To see this, consider $$ \Gamma(\nu+1)=\int_{0}^{\infty}e^{-t}t^{\nu}dt = -e^{-t}t^{\nu}\left.\vphantom{\hbox{\Large (}}\right|_{0}^{\infty}+\nu\int_{0}^{\infty}e^{-t}t^{\nu-1}dt = \nu \Gamma(\nu). $$ Starting with $\Gamma(1)=1$ (direct computation) and using the above relationship repeatedly one sees that $\Gamma(\nu)=(\nu-1)!$ for positive integer values of $\nu$. Thus, the Gamma function interpolates the factorial function (which is defined only for positive integers). Can we compute it for any other $\nu$? The answer is yes, but only for special values of $\nu$. For example, \[\begin{aligned} \Gamma(1/2)= \int_{0}^{\infty}x^{-1/2}e^{-x}dx = \sqrt{2}\int_{0}^{\infty}e^{-y^{2}/2}dy \end{aligned}\] by substituting $x=y^{2}/2$. The last integral was computed above in the context of the normal distribution and equal to $\sqrt{\pi/2}$. Hence we get $\Gamma(1/2)=\sqrt{\pi}$. From this, using again the relation $\Gamma(\nu+1)=\nu\Gamma(\nu)$, we can compute $\Gamma(3/2)=\frac{1}{2}\sqrt{\pi}$, $\Gamma(5/2)=\frac{3}{4}\sqrt{\pi}$, etc. Yet another useful fact about the Gamma function is its asymptotics as $\nu\rightarrow\infty$.

Stirling's approximation: $\frac{\Gamma(\nu+1)}{\nu^{\nu+\frac{1}{2} }e^{-\nu}\sqrt{2\pi} }\rightarrow 1$ as $\nu\rightarrow \infty$.

A small digression : It was Euler's idea to observe that $n!=\int_{0}^{\infty}x^{n}e^{-x}dx$ and that on the right side $n$ could be replaced by any real number greater than $-1$. But this was his second approach to defining the Gamma function. His first approach was as follows. Fix a positive integer $n$. Then for any $\ell\ge 1$ (also a positive integer), we may write \[\begin{aligned} n!=\frac{(n+\ell)!}{(n+1)(n+2)\ldots (n+\ell)} = \frac{\ell!(\ell+1)\ldots (\ell+n)}{(n+1)\ldots (n+\ell)} = \frac{\ell! \ell^{n} }{(n+1)\ldots (n+\ell)}\cdot\frac{(\ell+1)\ldots (\ell+n)}{\ell^{n} } \end{aligned}\] The second factor approaches $1$ as $\ell\rightarrow \infty$. Hence, \[\begin{aligned} n!=\lim_{\ell\rightarrow \infty}\frac{\ell! \ell^{n} }{(n+1)\ldots (n+\ell)}. \end{aligned}\] Euler then showed (by a rather simple argument that we skip) that the limit on the right exists if we replace $n$ by any complex number other than $\{-1,-2,-3,\ldots \}$ (negative integers are a problem as they make the denominator zero). Thus, he extended the factorial function to all complex numbers except negative integers! It is a fun exercise to check that this agrees with the definition by the integral given earlier. In other words, for $\nu > -1$, we have \[\begin{aligned} \lim_{\ell\rightarrow \infty}\frac{\ell! \ell^{\nu} }{(\nu+1)\ldots (\nu+\ell)}=\int_{0}^{\infty}x^{\nu}e^{-x}dx. \end{aligned}\]

Example 98

Beta distributions : Let $\alpha,\beta > 0$. The Beta distribution with parameters $\alpha,\beta$, denoted Beta($\alpha,\beta$), is the distribution with density and distribution given by - \[\begin{aligned} \mbox{PDF:} f(t) = \begin{cases}\frac{1}{B(\alpha,\beta)}t^{\alpha-1}(1-t)^{\beta-1}& \mbox{if } t\in(0,1) \ 0 & \mbox{otherwise} \end{cases}\qquad \mbox{CDF:} F(t) = \begin{cases}0 & \mbox{if } t\le 0 \ \int_{0}^{t}f(u)du & \mbox{if }t\in(0,1) \\ 0 &\mbox{if }t\ge 1.\end{cases} \end{aligned}\] Here $B(\alpha,\beta):=\int_{0}^{1}t^{\alpha-1}(1-t)^{\beta-1}dt$. Again, for special values of $\alpha,\beta$ (eg., positive integers), one can find the value of $B(\alpha,\beta)$, but in general there is no simple expression. However, it can be expressed in terms of the Gamma function!

Proposition 99

For any $\alpha,\beta > 0$, we have $B(\alpha,\beta)=\frac{\Gamma(\alpha)\Gamma(\beta)}{\Gamma(\alpha+\beta)}$.

For $\beta=1$ we see that $B(\alpha,1)=\int_{0}^{1}t^{\alpha-1}=\frac{1}{\alpha}$ which is also equal to $\frac{\Gamma(\alpha)\Gamma(1)}{\Gamma(\alpha+1)}$ as required. Similarly (or by the symmetry relation $B(\alpha,\beta)=B(\beta,\alpha)$), we see that $B(1,\beta)$ also has the desired expression.

Now for any other positive integer value of $\alpha$ and real $\beta > 0$ we can integrate by parts and get $$\begin{align*} B(\alpha,\beta)&=\int_{0}^{1}t^{\alpha-1}(1-t)^{\beta-1}dt \\ &= -\frac{1}{\beta}t^{\alpha-1}(1-t)^{\beta}\left.\vphantom{\hbox{\Large (}}\right|_{0}^{1} + \frac{\alpha-1}{\beta}\int_{0}^{1}t^{\alpha-2}(1-t)^{\beta}dt \\ &= \frac{\alpha-1}{\beta}B(\alpha-1,\beta+1). \end{align*}$$ Note that the first term vanishes because $\alpha > 1$ and $\beta > 0$. When $\alpha$ is an integer, we repeat this for $\alpha$ times and get $$ B(\alpha,\beta)=\frac{(\alpha-1)(\alpha-2)\ldots 1}{\beta(\beta+1)\ldots (\beta+\alpha-2)}B(1,\beta+\alpha-1). $$ But we already checked that $B(1,\beta+\alpha-1)=\frac{\Gamma(1)\Gamma(\alpha+\beta-1)}{\Gamma(\alpha+\beta)}$ from which we get $$ B(\alpha,\beta) = \frac{(\alpha-1)(\alpha-2)\ldots 1}{\beta(\beta+1)\ldots (\beta+\alpha-2)}\frac{\Gamma(1)\Gamma(\alpha+\beta-1)}{\Gamma(\alpha+\beta)} =\frac{\Gamma(\alpha)\Gamma(\beta)}{\Gamma(\alpha+\beta)} $$ by the recursion property of the Gamma function. Thus we have proved the proposition when $\alpha$ is a positive integer. By symmetry the same is true when $\beta$ is a positive integer (and $\alpha$ can take any value). We do not bother to prove the proposition for general $\alpha,\beta > 0$ here.

Example 100

The standard Cauchy distribution : is the distribution with density and distribution given by $$ \mbox{PDF:} f(t) = \frac{1}{\pi(1+t^{2})}\qquad \mbox{CDF:} F(t) = \frac{1}{2}+\frac{1}{\pi}\tan^{-1}t. $$ One can also make a parametric family of Cauchy distributions with parameters $\lambda > 0$ and $a\in \mathbb{R}$ denoted Cauchy($a,\lambda$) and having density and CDF $$ f(t)=\frac{\lambda}{\pi(\lambda^{2}+(t-a)^{2})}\qquad F(t)=\frac{1}{2}+\frac{1}{\pi}\tan^{-1}\left(\frac{t-a}{\lambda}\right). $$

Remark 101

Does every CDF come from a pdf? Not necessarily. For example any CDF that is not continuous (for example, CDFs of discrete distributions such as Binomial, Poisson, Geometric etc.). In fact even continuous CDFs may not have densities (there is a good example manufactured out of the $1/3$-Cantor set, but that would take us out of the topic now). However, suppose $F$ is a continuous CDF and suppose $F$ is differentiable except at finitely many points and that the derivative is a continuous function. Then $f(t):=F'(t)$ defines a pdf which by the fundamental theorm of Calculus satisfies $F(t)=\int_{-\infty}^{t}f(u)du$.

Chapter 17. Simulation

Chapter 16 : Examples of continuous distributions