# 3. Exercise sheet#

Some general remarks about the exercises:

• For your convenience functions from the lecture are included below. Feel free to reuse them without copying to the exercise solution box.

• For each part of the exercise a solution box has been added, but you may insert additional boxes. Do not hesitate to add Markdown boxes for textual or LaTeX answers (via Cell > Cell Type > Markdown). But make sure to replace any part that says YOUR CODE HERE or YOUR ANSWER HERE and remove the raise NotImplementedError().

• Please make your code readable by humans (and not just by the Python interpreter): choose informative function and variable names and use consistent formatting. Feel free to check the PEP 8 Style Guide for Python for the widely adopted coding conventions or this guide for explanation.

• Make sure that the full notebook runs without errors before submitting your work. This you can do by selecting Kernel > Restart & Run All in the jupyter menu.

• For some exercises test cases have been provided in a separate cell in the form of assert statements. When run, a successful test will give no output, whereas a failed test will display an error message.

• Each sheet has 100 points worth of exercises. Note that only the grades of sheets number 2, 4, 6, 8 count towards the course examination. Submitting sheets 1, 3, 5, 7 & 9 is voluntary and their grades are just for feedback.

NAME = ""
NAMES_OF_COLLABORATORS = ""


Exercise sheet 3

Code from the lectures:

import numpy as np
import matplotlib.pylab as plt

rng = np.random.default_rng()
%matplotlib inline

def sample_acceptance_rejection(sample_z,accept_probability):
while True:
x = sample_z()
if rng.random() < accept_probability(x):
return x

def estimate_expectation(sampler,n):
'''Compute beste estimate of mean and 1-sigma error with n samples.'''
samples = [sampler() for _ in range(n)]
return np.mean(samples), np.std(samples)/np.sqrt(n-1)

def estimate_expectation_one_pass(sampler,n):
sample_mean = sample_square_dev = 0.0
for k in range(1,n+1):
delta = sampler() - sample_mean
sample_mean += delta / k
sample_square_dev += (k-1)*delta*delta/k
return sample_mean, np.sqrt(sample_square_dev / (n*(n-1)))


## 3.1. Acceptance-rejection sampling#

(35 points)

The goal of this exercise is to develop a fast sampling algorithm of the discrete random variable $$X$$ with probability mass function $$$p_X(k) = \frac{6}{\pi^2} k^{-2}, \qquad k=1,2,\ldots$$$

(a) Let $$Z$$ be the discrete random variable with $$p_Z(k) = \frac{1}{k} - \frac{1}{k+1}$$ for $$k=1,2,\ldots$$. Write a function to compute the inverse CDF $$F_Z^{-1}(u)$$, such that you can use the inversion method to sample $$Z$$ efficiently. (15 pts)

def f_inverse_Z(u):
'''Compute the inverse CDF of Z, i.e. F_Z^{-1}(u) for 0 <= u <= 1.'''
raise NotImplementedError()

def random_Z():
return int(f_inverse_Z(rng.random())) # make sure to return an integer

assert f_inverse_Z(0.2)==1
assert f_inverse_Z(0.51)==2
assert f_inverse_Z(0.76)==4
assert f_inverse_Z(0.991)==111


(b) Implement a sampler for $$X$$ using acceptance-rejection based on the sampler of $$Z$$. For this you need to first determine a $$c$$ such that $$p_X(k) \leq c\,p_Z(k)$$ for all $$k=1,2,\ldots$$, and then consider an acceptance probability $$p_X(k) / (c p_Z(k))$$. Verify the validity of your sampler numerically (e.g. for $$k=1,\ldots,10$$). (20 pts)

def accept_probability_X(k):
'''Return the appropriate acceptance probability on the event Z=k.'''
raise NotImplementedError()

def random_X():
return sample_acceptance_rejection(random_Z,accept_probability_X)

# Verify numerically
raise NotImplementedError()

from nose.tools import assert_almost_equal
assert min([random_X() for _ in range(10000)]) >= 1
assert_almost_equal([random_X() for _ in range(10000)].count(1),6079,delta=400)
assert_almost_equal([random_X() for _ in range(10000)].count(3),675,delta=75)


## 3.2. Monte Carlo integration & Importance sampling#

(30 Points)

Consider the integral

$I = \int_0^1 \sin(\pi x(1-x))\mathrm{d}x = \mathbb{E}[X], \quad X=g(U), \quad g(U)=\sin(\pi U(1-U)),$

where $$U$$ is a uniform random variable in $$(0,1)$$.

(a) Use Monte Carlo integration based on sampling $$U$$ to estimate $$I$$ with $$1\sigma$$ error at most $$0.001$$. How many samples do you need? (It is not necessary to automate this: trial and error is sufficient.) (10 pts)

# YOUR CODE HERE
raise NotImplementedError()


(b) Choose a random variable $$Z$$ on $$(0,1)$$ whose density resembles the integrand of $$I$$ and which you know how to sample efficiently (by inversion method, acceptance-rejection, or a built-in Python function). Estimate $$I$$ again using importance sampling, i.e. $$I = \mathbb{E}[X']$$ where $$X' = g(Z) f_U(Z)/f_Z(Z)$$, with an error of at most 0.001. How many samples did you need this time? (20 pts)

def sample_nice_Z():
'''Sample from the nice distribution Z'''
raise NotImplementedError()

def sample_X_prime():
'''Sample from X'.'''
raise NotImplementedError()

raise NotImplementedError()


## 3.3. Direct sampling of Dyck paths#

(35 points)

Direct sampling of random variables in high dimensions requires some luck and/or ingenuity. Here is an example of a probability distribution on $$\mathbb{Z}^{2n+1}$$ that features prominently in the combinatorial literature and can be sampled directly in an efficient manner. A sequence $$\mathbf{x}\equiv(x_0,x_1,\ldots,x_{2n})\in\mathbb{Z}^{2n+1}$$ is said to be a Dyck path if $$x_0=x_{2n}=0$$, $$x_i \geq 0$$ and $$|x_{i}-x_{i-1}|=1$$ for all $$i=1,\ldots,2n$$. Dyck paths are counted by the Catalan numbers $$C(n) = \frac{1}{n+1}\binom{2n}{n}$$. Let $$\mathbf{X}=(X_0,\ldots,X_n)$$ be a uniform Dyck path, i.e. a random variable with probability mass function $$p_{\mathbf{X}}(\mathbf{x}) = 1/C(n)$$ for every Dyck path $$\mathbf{x}$$. Here is one way to sample $$\mathbf{X}$$.

def random_dyck_path(n):
'''Returns a uniform Dyck path of length 2n as an array [x_0, x_1, ..., x_{2n}] of length 2n.'''
# produce a (2n+1)-step random walk from 0 to -1
increments = *n +[-1]*(n+1)
rng.shuffle(increments)
unconstrained_walk = np.cumsum(increments)
# determine the first time it reaches its minimum
argmin = np.argmin(unconstrained_walk)
# cyclically permute the increments to ensure walk stays non-negative until last step
rotated_increments = np.roll(increments,-argmin)
# turn off the superfluous -1 step
rotated_increments = 0
# produce dyck path from increments
walk = np.cumsum(rotated_increments)
return walk

plt.plot(random_dyck_path(50))
plt.show()


(a) Let $$H$$ be the (maximal) height of $$X$$, i.e. $$H=\max_i X_i$$. Estimate the expected height $$\mathbb{E}[H]$$ for $$n = 2^5, 2^6, \ldots, 2^{11}$$ (including error bars). Determine the growth $$\mathbb{E}[H] \approx a\,n^\beta$$ via an appropriate fit. Hint: use the scipy.optimize.curve_fit function with the option sigma = ... to incorporate the standard errors on $$\mathbb{E}[H]$$ in the fit. Note that when you supply the errors appropriately, fitting on linear or logarithmic scale should result in the same answer. (25 pts)

# Collect height estimates
n_values = [2**k for k in range(5,11+1)]
raise NotImplementedError()

from scipy.optimize import curve_fit

# Fitting

# Plotting

(b) Produce a histogram of the height $$H / \sqrt{n}$$ for $$n = 2^5, 2^6, \ldots, 2^{11}$$ and $$3000$$ samples each and demonstrate with a plot that it appears to converge in distribution as $$n\to\infty$$. Hint: you could call plt.hist(...,density=True,histtype='step') for each $$n$$ to plot the densities on top of each other. (10 pts)
# YOUR CODE HERE