Probability Models and Stastics

In this section we talk about entropy, a concept of fundamental importance in physics, mathematics and information theory.¹This section was not covered in class and may be safely omitted

Definition 145

Let $X$ be a random variable that takes values in ${\mathcal A}=\{a_{1},\ldots ,a_{k}\}$ such that $\mathbf{P}(X=a_{i})=p_{i}$. The entropy of $X$ is defined as $$ H(X) := -\sum\limits_{i=1}^{k} p_{i}\log p_{i}. $$ If $X$ is a real-valued random variable with density $f$, its entropy is defined $$ H(X):= -\int f(t)\log f(t) dt. $$

Example 146

Let $X\sim\mbox{Ber}(p)$. Then $H(X)=p\log(1/p) +(1-p)\log(1/(1-p))$.

Example 147

Let $X\sim \mbox{Geo}(p)$. Then $H(X)=-\sum\limits_{k=0}^{\infty}(\log p+k\log q) pq^{k} = -\log p -q^{2}\log q$.

Example 148

Let $X\sim \mbox{Exp}(\lambda)$. Then $H(X)=\int_{0}^{\infty} (\log \lambda -t)\lambda e^{-\lambda t}dt=\log \lambda -\frac{1}{\lambda}$.

Example 149

Let $X\sim N(\mu,{\sigma}^{2})$

Entropy is a measure of the randomness. For example, among the $\mbox{Ber}(p)$ distributions, the entropy is maximized at $p=1/2$ and minimized at $p=0\mbox{ or }1$. It quantifies the intuitive feeling that $\mbox{Ber}(1/2)$ is more random than $\mbox{Ber}(1/4)$.

Lemma 150

If $|{\mathcal A}|=k$, then $0\le H(X)\le \log k$. $H(X)=0$ if and only if $X$ is degenerate and $H(X)=\log k$ if and only if $X\sim \mbox{Unif}({\mathcal A})$.
Let $f:{\mathcal A}\rightarrow {\mathcal B}$ and let $Y=f(X)$. Then $H(Y)\le H(X)$.
Let $X$ take values in ${\mathcal A}$ and $Y$ take values in ${\mathcal B}$ and let $Z=(X,Y)$. Then $H(Z)\le H(X)+H(Y)$ with equality if and only if $X$ and $Y$ are independent.

Gibbs measures : Let ${\mathcal A}$ be a countable set and let ${\mathcal H}:{\mathcal A}\rightarrow \mathbb{R}$ be a given function. For any $E\in \mathbb{R}$, consider the set of ${\mathcal P}_{E}$ of all probability mass functions on $\Omega$ under which ${\mathcal H}$ has expected value $E$. In other words, $$ {\mathcal P}_{E}:=\{{\bf p}=\left(p_{i}\right)_{i\in {\mathcal A}}{\; : \;} \sum_{i\in {\mathcal A}} p(i){\mathcal H}(i)=E\}. $$ ${\mathcal P}_{E}$ is non-empty if and only if ${\mathcal H}_{\min}\le E\le {\mathcal H}_{\max}$.

Lemma 151

Assume that ${\mathcal H}_{\min}\le E\le {\mathcal H}_{\max}$. Then, there is a unique pmf in ${\mathcal P}_{E}$ with maximal entropy and it is given by $$ p_{\beta}(i)=\frac{1}{Z_{\beta} }e^{-\beta {\mathcal H}(i)} $$ where $Z_{\beta}=\sum\limits_{i\in {\mathcal A}}e^{-\beta {\mathcal H}(i)}$ and the value of $\beta$ is chosen to satisfy $\frac{1}{Z_{\beta} }\frac{\partial Z_{\beta} }{\partial \beta}=E$.

This minimizing pmf is called the Boltzmann-Gibbs distribution. An analogous theorem holds for densities.

Example 152

Let ${\mathcal A}=\{1,2,\ldots,n\}$ and ${\mathcal H}(i)=1$ for all $i$. Let $E=1$ so that ${\mathcal P}_{E}$ is the same as all pmfs on ${\mathcal A}$. Clearly $p_{\beta}(i)=\frac{1}{n}$ for all $i\le n$. Indeed, we know that the maximal entropy is attained by the uniform distribution.

Example 153

Let ${\mathcal A}=\{0,1,2,\ldots\}$ and let ${\mathcal H}(i)=i$ for all $i$. Fix any $E > 0$. The Boltzmann-Gibbs distribution is given by $p_{\beta}(i)=\frac{1}{Z_{\beta} }e^{-\beta i}$. This is just the Geometric distribution with parameter chosen to have mean $E$.

Example 154

Let us blindly apply the lemma to densities.

${\mathcal A}=\mathbb{R}_{+}$ and ${\mathcal H}(x)=\lambda x$
${\mathcal A}=\mathbb{R}$ and ${\mathcal H}(x)=x^{2}$.

Chapter 28. Introduction

Chapter 27 : Entropy, Gibbs distribution