Let $\mathbf{X}=(X_{1},\ldots ,X_{m})$ be a random vector with density $f(t_{1},\ldots ,t_{m})$. Let $T:\mathbb{R}^{m}\rightarrow \mathbb{R}^{m}$ be a one-one function which is continuously differentiable (many exceptions can be made as remarked later).

Let $\mathbf{Y}=T(\mathbf{X})$. In co-ordinates we may write $\mathbf{Y}=(Y_{1},\ldots ,Y_{m})$ and $Y_{1}=T_{1}(X_{1},\ldots ,X_{m})$\dots $Y_{m}=T_{m}(X_{1},\ldots ,X_{m})$ where $T_{i}:\mathbb{R}^{m}\rightarrow \mathbb{R}$ are the components of $T$.

Question : What is the joint density of $Y_{1},\ldots ,Y_{m}$?

The change of variable formula : In the setting described above, the joint density of $Y_{1},\ldots ,Y_{m}$ is given by $$ g(\mathbf{y})=f\left(T^{-1}\mathbf{y}\right)\ \pmb{\big|} \ J[T^{-1}](\mathbf{y})\ \pmb{\big|} \ $$ where $J[T^{-1}](\mathbf{y})$ is the Jacobian determinant of the function $T^{-1}$ at the point $\mathbf{y}=(y_{1},\ldots ,y_{m})$.

Justification : We shall not prove this formula, but give a imprecise but convincing justification that can be made into a proof. There are two factors on the right. The first one, $f(T^{-1}\mathbf{y})$ is easy to understand - if $\mathbf{Y}$ is to be close to $\mathbf{y}$, then $\mathbf{X}$ must be close to $T^{-1}\mathbf{y}$. The second factor involving the Jacobian determinant comes from the volume change. Let us explain with analogy with mass density which is a more familiar quantity.

Consider a solid cube with non-uniform density. If you rotate it, the density at any point now is the same as the original density, but at a different point (the one which came to the current position). Instead of rotating, suppose we uniformly expand the cube so that the center stays where it is and the side of the cube becomes twice what it is. What happens to the density at the center? It goes down by a factor of $8$. This is simply because of volume change - the same mass spreads over a larger volume. More generally, we can have non-uniform expansion, we may cool some parts of the cube, heat some parts and to varying degrees. What happens to the density? At each point, the density changes by a factor given by the Jacobian determinant.

Now for a slightly more mathematical justification. We use the language for two variables ($m=2$) but the same reasoning works for any $m$. Fix twp point $\mathbf{x}=(x_{1},x_{2})$ and $\mathbf{y}=(y_{1},y_{2})$ such that $\mathbf{y}=T(\mathbf{x})$ (and hence $\mathbf{x}=T^{-1}(\mathbf{y})$). The density of $\mathbf{Y}$ at $\mathbf{y}$ is given by $$\begin{align*} g(\mathbf{y})&\approx \frac{1}{\mbox{area}(\mathcal N)}\mathbf{P}\{\mathbf{Y}\in \mathcal N\} \end{align*}$$ where $\mathcal N$ is a small neighbourhood of the point $\mathbf{y}$ (for example a disk of small radius $\delta$ centered at $\mathbf{y}$). By the one-one nature of $T$ and the relationship $\mathbf{Y}=T(\mathbf{X})$, we see that $$\mathbf{P}\{\mathbf{Y}\in \mathcal N\}=\mathbf{P}\{\mathbf{X}\in T^{-1}(\mathcal N)\}$$ where $T^{-1}(\mathcal N)$ is the image of $\mathcal N$ after mapping by $T^{-1}$. Now, $T^{-1}(\mathcal N)$ is a small neighbourhood of $\mathbf{x}$ (if $\mathcal N$ is a disk, then $T^{-1}(\mathcal N)$ would be an approximate ellipse) and hence, by the same interpretation of density we see that $$\begin{align*} \mathbf{P}\{\mathbf{X}\in T^{-1}(\mathcal N)\}&\approx \mbox{area}(T^{-1}(\mathcal N))f(\mathbf{x}) \end{align*}$$ Putting the three displayed equations together, we arrive at the formula $$ g(\mathbf{y})\approx f(\mathbf{x})\frac{\mbox{area}(T^{-1}(\mathcal N))}{\mbox{area}(\mathcal N)} $$ Thus the problem boils down to how areas change under transformations. A linear map $S(\mathbf{y})=A\mathbf{y}$ where $A$ is a $2\times 2$ matrix changes area of any region by a factor of $|\det(A)|$, i.e., $\mbox{area}(S(\mathcal R))=|\det(A)|\mbox{area}(\mathcal R)$.

The differentiability of $T$ means that in a small neighbourhood of $\mathbf{y}$, the mapping $T^{-1}$ looks like a linear map, $T^{-1}(\mathbf{y}+\mathbf h)\approx \mathbf{x}+DT^{-1}(\mathbf{y}){\bf h}$. Therefore, the areas of small neighbourhoods of $\mathbf{y}$ change by a factor equal to $|\det(DT^{-1}(\mathbf{y}))|$ which is the Jacobian determinant. In other words, $\mbox{area}(T^{-1}(\mathcal N))\approx |JT^{-1}(\mathbf{y})|\mbox{area}(\mathcal N)$. Consequently $g(\mathbf{y})=f(T^{-1}\mathbf{y})|JT^{-1}(\mathbf{y})|$.

Enlarging the applicability of the change of variable formula : The change of variable formula is applicable in greater generality than we stated above.

  1. Firstly, $T$ does not have to be defined on all of $\mathbb{R}^{m}$. It is sufficient if it is defined on the range of $\mathbf{X}$ (i.e., if $f(t_{1},\ldots ,t_{m})=0$ for $(t_{1},\ldots ,t_{m})=\mathbb{R}^{m}\setminus A$, then it is enough if $T$ is defined on $A$.
  2. Even within the range of $\mathbf{X}$, we can allow $T$ to be undefined, but $\mathbf{X}$ must have zero probability to fall in the set where it is undefined. For example, it can happen at finitely many points, or on a line (if $m\ge 2$) or on a plane (if $m\ge 3$) etc.
  3. Similarly, the differentiability of $T$ is required only on a subset outside of which $\mathbf{X}$ has probability $0$ of falling.
  4. One-one property of $T$ is important, but there are special cases which can be dealt with by a slight modification. For example, if $T(x)=x^{2}$ or $T(x_{1},x_{2})=(x_{1}^{2},x_{2}^{2})$ where we can split the space into parts on each of which $T$ is one-one.

 

Example 116
Let $X_{1},X_{2}$ be independent Exp($\lambda$) random variables. Let $T(x_{1},x_{2})=(x_{1}+x_{2},\frac{x_{1} }{x_{1}+x_{2} })$. This is well-defined on $\mathbb{R}_{+}^{2}$ (and note that $\mathbf{P}\{(X_{1},X_{2})\in \mathbb{R}_{+}^{2}\}=1$) and its range is $\mathbb{R}_{+}\times (0,1)$. The inverse function is $T^{-1}(y_{1},y_{2})=(y_{1}y_{2},y_{1}(1-y_{2}))$. Its Jacobian determinant is $$ J[T^{-1}](y_{1},y_{2})=\det\left[\begin{array}{cc}y_{2} & y_{1}\ 1-y_{2} & -y_{1} \end{array}\right]=-y_{1}. $$ $(X_{1},X_{2})$ has density $f(x_{1},x_{2})=\lambda^{2}e^{-\lambda(x_{1}+x_{2})}$ for $x_{1},x_{2} > 0$ (henceforth it will be a convention that the density is zero except where we specify it). Hence, the random variables $Y_{1}=X_{1}+X_{2}$ and $Y_{2}=\frac{X_{1} }{X_{1}+X_{2} }$ have joint density $$ g(y_{1},y_{2})=f(y_{1}y_{2},y_{1}(1-y_{2}))|J[T^{-1}](y_{1},y_{2})| = \lambda^{2}e^{-\lambda(y_{1}y_{2}+y_{1}(1-y_{2}))}y_{1}=\lambda^{2}y_{1}e^{-\lambda y_{1} } $$ for $y_{1} > 0$ and $y_{2}\in (0,1)$.

In particular, we see that $Y_{1}=X_{1}+X_{2}$ has density $h_{1}(t)=\int_{0}^{1}\lambda^{2}te^{-\lambda t}ds = \lambda^{2}te^{-\lambda t}$ (for $t > 0$) which means that $Y_{1}\sim \mbox{Gamma}(2,\lambda)$. Similarly, $Y_{2}=\frac{X_{1} }{X_{1}+X_{2} }$ has density $h_{2}(s)=\int_{0}^{\infty} \lambda^{2}t e^{-\lambda t}dt =1$ (for $s\in (0,1)$) which means that $Y_{2}$ has $\mbox{Unif}(0,1)$ distribution. In fact, $Y_{1}$ and $Y_{2}$ are also independent since $g(u,v)=h_{1}(u)h_{2}(v)$.

 

Exercise 117
Let $X_{1}\sim \mbox{Gamma}(\nu_{1},\lambda)$ and $X_{2}\sim \mbox{Gamma}(\nu_{2},\lambda)$ (note that the shape parameter is the same) and assume that they are independent. Find the joint distribution of $X_{1}+X_{2}$ and $\frac{X_{1} }{X_{1}+X_{2} }$.

 

Example 118
Suppose we are given that $X_{1}$ and $X_{2}$ are independent and each has $\mbox{Exp}(\lambda)$ distribution. What is the distribution of the random variable $X_{1}+X_{2}$?

The change of variable formula works for transformations from $\mathbb{R}^{m}$ to $\mathbb{R}^{m}$ whereas here we have two random variables $X_{1},X_{2}$ and our interest is in one random variable $X_{1}+X_{2}$. To use the change of variable formula, we must introduce an auxiliary variable. For example, we take $Y_{1}=X_{1}+X_{2}$ and $Y_{2}=X_{1}/(X_{1}+X_{2})$. Then as in the first example, we find the joint density of $(Y_{1},Y_{2})$ using the change of variable formula and then integrate out the second variable to get the density of $Y_{1}$.

Let us emphasize the point that if our interest is only in $Y_{1}$, then we have a lot of freedom in choosing the auxiliary variable. The only condition is that from $Y_{1}$ and $Y_{2}$ we should be able to recover $X_{1}$ and $X_{2}$. Let us repeat the same using $Y_{1}=X_{1}+X_{2}$ and $Y_{2}=X_{2}$. Then, $T(x_{1},x_{2})=(x_{1}+x_{2},x_{2})$ maps $\mathbb{R}_{+}^{2}$ onto $Q:=\{(y_{1},y_{2}){\; : \;} y_{1} > y_{2} > 0\}$ in a one-one manner. The inverse function is $T^{-1}(y_{1},y_{2})=(y_{1}-y_{2},y_{2})$. It is easy to see that $JT^{-1}(y_{1},y_{2})=1$ (check!). Hence, by the change of variable formula, the density of $(Y_{1},Y_{2})$ is given by $$\begin{align*} g(y_{1},y_{2}) &= f(y_{1}-y_{2},y_{2})\cdot 1 \\ &= \lambda^{2}e^{-\lambda(y_{1}-y_{2})}e^{-\lambda y_{2} } \hspace{2mm}(\mbox{if }y_{1} > y_{2} > 0) \\ &= \lambda^{2}e^{-\lambda y_{1} }{\mathbf 1}_{y_{1} > y_{2} > 0}. \end{align*}$$ To get the density of $Y_{1}$, we integrate out the second variable. The density of $Y_{1}$ is $$\begin{align*} h(u) &= \int\limits_{-\infty}^{\infty}\lambda^{2}e^{-\lambda y_{1} }{\mathbf 1}_{y_{1} > y_{2} > 0} dy_{2} \\ &= \lambda^{2}e^{-\lambda y_{1} }\int\limits_{0}^{y_{1} }dy_{2} \\ &= \lambda^{2}y_{1}e^{-\lambda y_{1} } \end{align*}$$ which agrees with what we found before.

 

Example 119
Suppose $R\sim \mbox{Exp}(\lambda)$ and $\Theta\sim \mbox{Unif}(0,2\pi)$ and the two are independent. Define $X=\sqrt{R}\cos(\Theta)$ and $Y=\sqrt{R}\sin(\Theta)$. We want to find the distribution of $(X,Y)$. For this, we first write the joint density of $(R,\Theta)$ which is given by $$ f(r,\theta)=\frac{1}{2\pi}\lambda e^{-\lambda r} \hspace{2mm}\mbox{ for }r > 0,\theta\in (0,2\pi). $$ Define the transformation $T:\mathbb{R}_{+}\times (0,2\pi)\rightarrow \mathbb{R}^{2}$ by $T(r,\theta)=(\sqrt{r}\cos\theta,\sqrt{r}\sin \theta)$. The image of $T$ consists of all $(x,y)\in \mathbb{R}^{2}$ with $y\not=0$. The inverse is $T^{-1}(x,y)=(x^{2}+y^{2},\arctan(y/x))$ where $\arctan(y/x)$ is defined so as to take values in $(0,\pi)$ when $y > 0$ and to take values in $(\pi,2\pi)$ when $y < 0$. Thus $$ JT^{-1}(x,y) = \det\left[\begin{array}{cc} 2x & 2y \ \frac{-y}{x^{2}+y^{2} } & \frac{x}{x^{2}+y^{2} } \end{array} \right] =2. $$ Therefore, $(X,Y)$ has joint density $$ g(x,y) = 2f(x^{2}+y^{2},\arctan(y/x)) = \frac{\lambda}{\pi} e^{-\lambda(x^{2}+y^{2})}. $$ This is for $(x,y)\in \mathbb{R}^{2}$ with $y\not=0$, but as we have remarked earlier, the value of a pdf in $\mathbb{R}^{2}$ on a line does not matter, we may define $g(x,y)$ as above for all $(x,y)$ (main point is that the CDF does not change). Since $g(x,y)$ separates into a function of $x$ and a function of $y$, $X,Y$ are independent $N(0,\frac{1}{2\lambda})$.

 

Remark 120
Relationships between random variables derived by the change of variable formulas can be used for simulation too. For instance, the CDF of $N(0,1)$ is not explicit and hence simulating from that distribution is difficult (must resort to numerical methods). However, we can easily simulate it as follows. Simulate an $\mbox{Exp}(1/2)$ random variable $R$ (easy, as the distribution function can be inverted) and simulate an independent $\mbox{Unif}(0,2\pi)$ random variable $\Theta$. Then set $X=\sqrt{R}\cos(\Theta)$ and $Y=\sqrt{R}\sin(\Theta)$. These are two independent $N(0,1)$ random numbers. Here it should be noted that the random numbers in $(0,1)$ given by a random number generator are supposed to be independent uniform random numbers (otherwise, it is not acceptable as a random number generator).

Chapter 20. Independence and conditioning of random variables