---
jupytext:
text_representation:
extension: .md
format_name: myst
format_version: 0.13
jupytext_version: 1.11.5
kernelspec:
display_name: Python 3
language: python
name: python3
---
# Direct-sampling Monte Carlo integration
After last week's recap of probability theory and random variables, it is time to apply this knowledge to understand the applicability and shortcomings of Monte Carlo methods. Before heading in that direction let us add one more method to our toolbox to sample random variables.
## Acceptance-Rejection sampling
Suppose we wish to sample a continuous random variable $X$ with probability density function $f_X(x)$. Last week we have seen that one way to achieve this is via the {ref}`inversion-method` by applying the inverse cumulative distribution function $F_X^{-1}$ to a uniform random variable $U$ on $(0,1)$. But this requires that we have an efficient way to calculate $F_X^{-1}(p)$ for any real number $p\in(0,1)$. In many case a closed-form expression for $F_X$ is not available, let alone for its inverse, so one has to resort to costly numerical approximations. Applying the inversion method to joint distributions of several random variables becomes even more challenging. We also discussed {ref}`change-of-variables` as a method to generate particular distributions, but discovering suitable transformations requires some luck and ingenuity.
A more generally applicable method, that may or may not be efficient, is **acceptance-rejection sampling**. Suppose there is another random variable $Z$ with density $f_Z(x)$ that we already know how to sample efficiently and that $f_Z(x)$ and $f_X(x)$ are not too dissimilar. More precisely, suppose we can find a $c \geq 1$ such that $f_X(x) \leq c\,f_Z(x)$ for all $x\in\mathbb{R}$. Then we can sample $X$ using the following simple algorithm. Here `sample_z` is a function sampling $Z$ and `accept_probability` is the function $x\to f_X(x) / (c f_Z(x))$.
```{code-cell} ipython3
import numpy as np
rng = np.random.default_rng()
def sample_acceptance_rejection(sample_z,accept_probability):
while True:
x = sample_z()
if rng.random() < accept_probability(x):
return x
```
Why does this work? It is useful to assign a geometric interpretation to this sampling procedure (see the figure below). Let $U$ be uniform random in $(0,1)$ independent of $X$. The point $(X,Y) = (X,U f_X(X))$ is then uniform in the region $S_X=\{ (x,y) : x\in \mathbb{R},\,0< y< f_X(x)\}$ enclosed by the graph of $f_X$ and the $x$-axis. To see this, we note that $(x,u) \to (x,y) = (x,u f_X(x))$ is a differentiable and invertible mapping (at least where $f_X(x)\neq 0$) and we can therefore relate the joint densities by change of variables: $f_X(x) \mathrm{d}x \mathrm{d}u =\mathrm{d}x\mathrm{d}y$ for $(x,y) \in S_X$, so $f_{X,Y} = \mathbf{1}_{S_X}$. One way to sample $(X,Y)$ is by sampling from a larger but simpler region containing $S_X$ and rejecting any sample that is not in $S_X$. If we have another random variable $Z$ and constant $c\geq 1$ satisfying $f_X(x) \leq c\,f_Z(x)$ then a natural region is $c S_Z = \{(x,cy) : x\in\mathbb{R}, 0 \epsilon ) \leq \frac{\sigma_X^2}{n\epsilon^2} \qquad \text{for }\epsilon>0, \,n\geq 1.
$$
Equivalently
$$
\mathbb{P}\left( \bar{X}_n - \frac{\sigma_X}{\sqrt{\delta n}} \leq \mathbb{E}[X] \leq \bar{X}_n + \frac{\sigma_X}{\sqrt{\delta n}}\right) \geq 1- \delta \qquad \text{for any }\delta > 0, \,n\geq 1.
$$
For example, after an experiment with $n$ samples we can say that with probability **at least** 0.682 the mean will lie within distance $1.77 \sigma_X/\sqrt{n}$ of our estimate $\bar{E}_n$.
This bound holds for any $n$, even $n=1$, but is quite conservative when $n$ is large, because it does not take into account that the error approaches a normal distribution. In that limit the {ref}`central-limit-theorem` gives a more tight bound, because we know
$$
\mathbb{P}( |\bar{X}_n - \mathbb{E}[X]| > \frac{\sigma_X}{\sqrt{n}}\epsilon ) \quad\xrightarrow{n\to\infty}\quad \mathbb{P}( | \mathcal{N}| > \epsilon ) = \frac{2}{\sqrt{2\pi}}\int_{\epsilon}^\infty e^{-x^2/2}\mathrm{d}x.
$$
This gives the more accurate **1$\sigma$-confidence level**, stating that in the limit of large $n$ with **precisely** $\mathbb{P}(|\mathcal{N}|>1) = 0.682\ldots$ probability we have that
$$\bar{X}_n - \frac{\sigma_X}{\sqrt{n}} \,\leq\, \mathbb{E}[X] \,\leq\, \bar{X}_n + \frac{\sigma_X}{\sqrt{n}}.\qquad(1\sigma\text{ or }68.2\%\text{ confidence when CLT holds})$$
So knowing that the central limit theorem holds allows us to decrease the error interval by a factor $1.77$ in this case.
Of course, we are cheating a bit here, because generally we do not know the variance $\sigma_X^2$, so it has to be estimated as well. An **unbiased estimator** for the variance is given by **sample variance**
$$
s^2_n = \frac{1}{n-1} \sum_{i=1}^n (X_i - \bar{X}_n)^2,
$$
which differs from the usual variance of the sample sequence $X_1,\ldots,X_n$ by a factor $n/(n-1)$. This additional factor ensures that
$$
\mathbb{E}[s^2_n] = \sigma^2_X,
$$
as you may check by expanding the parentheses and using linearity of expectation. A variation of the weak law of large numbers tells us that $s_n$ really converges in probability to $\sigma_X$ as $n\to\infty$. We may thus substitute the variance by the sample variance in the error estimate without changing the central limit theorem. Hence with the **1$\sigma$-confidence level** can also be written as
$$
\bar{X}_n - \frac{s_n}{\sqrt{n}} \,\leq\, \mathbb{E}[X] \,\leq\, \bar{X}_n + \frac{s_n}{\sqrt{n}}.\qquad(1\sigma\text{ or }68.2\%\text{ confidence when CLT holds})
$$
The result of an experiment is thus generally reported as $\mathbb{E}[X] = \bar{X}_n \pm \frac{s_n}{\sqrt{n}}$.
Let's consider an example that is directly related to the direct-sampling pebble game from the first chapter by considering the function $g(x) = \sqrt{1-x^2}$ on $(0,1)$ and estimating $\pi/4$ by Monte Carlo integrating
$$
\frac{\pi}{4} = \int_{-\infty}^{\infty} \sqrt{1-x^2}\, \mathbf{1}_{0 1 else np.sqrt(1-norm)
for d in range(1,5):
print("In dimension {}:".format(d))
result_quad = nquad(sphere_function,[[0.0,1.0] for _ in range(d)],opts={'epsabs':[1e-10,1e-8,1e-4,1e-1][d-1]},full_output=True)
print(" quadrature integration (n = {}): {} ± {}".format(result_quad[2]['neval'],result_quad[0],result_quad[1]))
num_mc = [2000,10000,100000,100000][d-1]
result_mc = estimate_expectation_one_pass(lambda : sphere_function(*rng.random(d)),num_mc)
print(" Monte Carlo integration (n = {}): {} ± {}".format(num_mc,*result_mc))
```
Beyond 4 dimensions SciPy's quadrature starts struggling to provide any sensible answer, while the Monte Carlo integration is only starting to stretch its legs.
### Variance reduction: importance sampling
It is all good and well that an error of $O(n^{-1/2})$ can be competitive compared to other approaches (especially in higher dimensions), but if the multiplicative factor $\sigma_X$ in front is huge we are still in trouble. If $g(x)$ is close to zero in a large part of the domain, then intuitively we expect the Monte Carlo integration to be wasteful, because most of the samples will yield little contribution to the sample mean $\bar{X}_n$. To make this quantitative we can calculate the relative variance as
$$
\sigma_X^2 = \int_{-\infty}^\infty \left(g(y)-\mathbb{E}[X]\right)^2 f_Y(y)\mathrm{d}y.
$$
It vanishes when $g(y)$ is constant (and thus equal to $\mathbb{E}[X]$) and can be very large if $g(y)$ is very localized. The situation can be improved by changing the distribution of the random variables used. In particular, if $Z$ is a random variable with density $f_Z(y)$ that vanishes only when $f_Y(y)$ vanishes, then we can write
$$
\mathbb{E}[X] = \int_{-\infty}^\infty g(y)f_Y(y)\mathrm{d}y = \int_{-\infty}^\infty \frac{g(y)f_Y(y)}{f_Z(y)} f_Z(y)\mathrm{d}y = \mathbb{E}\left[\frac{g(Z)f_Y(Z)}{f_Z(Z)}\right].
$$
What we have done is replacing the random variable $Y$ by another random variable $Z$ and the function $g(x)$ by $g(x) f_Y(x)/f_Z(x)$ without changing the expectation value, $\mathbb{E}[X] = \mathbb{E}[X']$ where $X' = \frac{g(Z)f_Y(Z)}{f_Z(Z)}$. The variance however has changed, because now
$$
\sigma_{X'}^2 = \int_{-\infty}^\infty \left(\frac{g(z)f_Y(z)}{f_Z(z)}-\mathbb{E}[X]\right)^2 f_Z(z)\mathrm{d}z.
$$
The strategy is then to choose $Z$ such that $f_Z(z)$ is as close to being proportional to $g(z)f_Y(z)$ as possible.
As an example let us compute the probability $\mathbb{P}(Y > 3)$ for a normal random variable $Y$ by setting $g(y) = \mathbf{1}_{\{y>3\}}$, i.e.
$$
I = \int_{-\infty}^\infty \mathbf{1}_{\{y>3\}} f_Y(y)\mathrm{d}y = \int_4^{\infty} \frac{1}{2\pi}e^{-y^2/2}\mathrm{d}y.
$$
```{code-cell} ipython3
def sample_x_naive():
return 1 if rng.normal() > 3 else 0
estimate_expectation_one_pass(sample_x_naive,10000)
```
The standard error is still quite large despite many samples. Instead, let us consider the random variable to be $Z$ with PDF $f_Z(x) = e^{-(x-3)} \mathbf{1}_{\{x>3\}}$ and compute $\mathbb{E}[X]=\mathbb{E}[e^{Z-3}f_Y(Z)]$.
```{code-cell} ipython3
def random_exponential():
return - np.log(rng.random())
def sample_x_faster():
z = random_exponential() + 3
return np.exp(z-3-z*z/2)/np.sqrt(2*np.pi)
estimate_expectation_one_pass(sample_x_faster,10000)
```
This has reduced our error by a factor of about 30.
Note that it can happen that $\mathbb{E}[X]$ is finite but that one of $\sigma_X^2$ and $\sigma_{X'}^2$ is infinite. In last week's exercises we have seen that having infinite variance can lower the convergence rate of $\bar{X}_n$ to $\mathbb{E}[X]$ in an undesirable way. Importance sampling often provides a useful method to cure such infinite variances. For example, let us try to compute the integral
$$
I = \int_0^1 x^{-\alpha} e^{-x} \mathrm{d}x
$$
for $1/2<\alpha<1$. If we take $I = \mathbb{E}[X]$ with $X=U^{-\alpha}e^{-U}$ for a uniform random variable $U$ then the variance $\sigma_X^2 = \infty$. Using importance sampling with $f_Z(x) = (1-\alpha)x^{-\alpha} \mathbf{1}_{\{0