2. Exercise sheet#

Some general remarks about the exercises:

  • For your convenience functions from the lecture are included below. Feel free to reuse them without copying to the exercise solution box.

  • For each part of the exercise a solution box has been added, but you may insert additional boxes. Do not hesitate to add Markdown boxes for textual or LaTeX answers (via Cell > Cell Type > Markdown). But make sure to replace any part that says YOUR CODE HERE or YOUR ANSWER HERE and remove the raise NotImplementedError().

  • Please make your code readable by humans (and not just by the Python interpreter): choose informative function and variable names and use consistent formatting. Feel free to check the PEP 8 Style Guide for Python for the widely adopted coding conventions or this guide for explanation.

  • Make sure that the full notebook runs without errors before submitting your work. This you can do by selecting Kernel > Restart & Run All in the jupyter menu.

  • Each sheet has 100 points worth of exercises. Note that only the grades of sheets number 2, 4, 6, 8 count towards the course examination. Submitting sheets 1, 3, 5, 7 & 9 is voluntary and their grades are just for feedback.

Please fill in your name here and, in case you worked together closely with someone, the name of your collaborator(s) as well:

NAME = ""
COLLABORATORS = ""

Exercise sheet 2

Code from the lecture:

import numpy as np
import matplotlib.pylab as plt
from scipy.integrate import quad

rng = np.random.default_rng()
%matplotlib inline

def inversion_sample(f_inverse):
    '''Obtain an inversion sample based on the inverse-CDF f_inverse.'''
    return f_inverse(rng.random())

def compare_plot(samples,pdf,xmin,xmax,bins):
    '''Draw a plot comparing the histogram of the samples to the expectation coming from the pdf.'''
    xval = np.linspace(xmin,xmax,bins+1)
    binsize = (xmax-xmin)/bins
    # Calculate the expected numbers by numerical integration of the pdf over the bins
    expected = np.array([quad(pdf,xval[i],xval[i+1])[0] for i in range(bins)])/binsize
    measured = np.histogram(samples,bins,(xmin,xmax))[0]/(len(samples)*binsize)
    plt.plot(xval,np.append(expected,expected[-1]),"-k",drawstyle="steps-post")
    plt.bar((xval[:-1]+xval[1:])/2,measured,width=binsize)
    plt.xlim(xmin,xmax)
    plt.legend(["expected","histogram"])
    plt.show()
    
def gaussian(x):
    return np.exp(-x*x/2)/np.sqrt(2*np.pi)

2.1. Sampling random variables via the inversion method#

(35 Points)

Recall from the lecture that for any real random variable \(X\) we can construct an explicit random variable via the inversion method that is identically distributed. This random variable is given by \(F_X^{-1}(U)\) where \(F_X\) is the CDF of \(X\) and \(U\) is a uniform random variable on \((0,1)\) and

\[ F_X^{-1}(p) := \inf\{ x\in\mathbb{R} : F_X(x) \geq p\}. \]

This gives a very general way of sampling \(X\) in a computer program, as you will find out in this exercise.

(a) Let \(X\) be an exponential random variable with rate \(\lambda\), i.e. a continuous random variable with probability density function \(f_X(x) = \lambda e^{-\lambda x}\) for \(x > 0\). Write a function f_inverse_exponential that computes \(F_X^{-1}(p)\). Illustrate the corresponding sampler with the help of the function compare_plot above. Take care to choose the limits and bin sizes of your plot wisely! (10 pts)

YOUR ANSWER HERE

def f_inv_exponential(lam,p):
    # YOUR CODE HERE
    raise NotImplementedError()
    
# plotting
# YOUR CODE HERE
raise NotImplementedError()
from nose.tools import assert_almost_equal
assert_almost_equal(f_inv_exponential(1.0,0.6),0.916,delta=0.001)
assert_almost_equal(f_inv_exponential(0.3,0.2),0.743,delta=0.001)

(b) Let now \(X\) have the Pareto distribution of shape \(\alpha > 0\) on \((b,\infty)\), which has probability density function \(f_X(x) = \alpha b^{\alpha} x^{-\alpha-1}\) for \(x > b\). Write a function f_inv_pareto that computes \(F_X^{-1}(p)\). Compare a histogram with a plot of \(f_X(x)\) to verify your function numerically. (10 pts)

YOUR ANSWER HERE

### Solution
def f_inv_pareto(alpha,b,p):
    # YOUR CODE HERE
    raise NotImplementedError()

# plotting
# YOUR CODE HERE
raise NotImplementedError()
from nose.tools import assert_almost_equal
assert_almost_equal(f_inv_pareto(1.0,1.5,0.6),3.75,delta=0.0001)
assert_almost_equal(f_inv_pareto(2.0,2.25,0.3),2.689,delta=0.001)

(c) Let \(X\) be a discrete random variable taking values in \(\{1,2,\ldots,n\}\). Write a Python function f_inv_discrete that takes the probability mass function \(p_X\) as a list prob_list given by \([p_X(1),\ldots,p_X(n)]\) and returns a random sample with the distribution of \(X\) using the inversion method. Verify the working of your function numerically on an example. (15 pts)

def f_inv_discrete(prob_list,p):
    # YOUR CODE HERE
    raise NotImplementedError()

# plotting
# YOUR CODE HERE
raise NotImplementedError()
assert f_inv_discrete([0.5,0.5],0.4)==1
assert f_inv_discrete([0.5,0.5],0.8)==2
assert f_inv_discrete([0,0,1],0.1)==3

2.2. Central limit theorem?#

(35 Points)

In this exercise we will have a closer look at central limits of the Pareto distribution, for which you implemented a random sampler in the previous exercise. By performing the appropriate integrals it is straightforward to show that

\[\begin{split} \mathbb{E}[X] = \begin{cases} \infty & \text{for }\alpha \leq 1 \\ \frac{\alpha b}{\alpha - 1} & \text{for }\alpha > 1 \end{cases}, \qquad \operatorname{Var}(X) = \begin{cases} \infty & \text{for }\alpha \leq 2 \\ \frac{\alpha b^2}{(\alpha - 1)^2(\alpha-2)} & \text{for }\alpha > 2 \end{cases}. \end{split}\]

This shows in particular that the distribution is heavy tailed, in the sense that some moments \(\mathbb{E}[X^k]\) diverge.

(a) Write a function sample_Zn that produces a random sample for \(Z_n= \frac{\sqrt{n}}{\sigma_X}(\bar{X}_n - \mathbb{E}[X])\) given \(\alpha>2\), \(b>0\) and \(n\geq 1\). Visually verify the central limit theorem for \(\alpha = 4\), \(b=1\) and \(n=1000\) by comparing a histogram of \(Z_n\) to the standard normal distribution (you may use compare_plot). (10 pts)

def sample_Zn(alpha,b,n):
    # YOUR CODE HERE
    raise NotImplementedError()

# Plotting
# YOUR CODE HERE
raise NotImplementedError()
assert_almost_equal(np.mean([sample_Zn(3.5,2.1,100) for _ in range(100)]),0,delta=0.3)
assert_almost_equal(np.std([sample_Zn(3.5,2.1,100) for _ in range(100)]),1,delta=0.3)

(b) Now take \(\alpha = 3/2\) and \(b=1\). With some work (which you do not have to do) one can show that the characteristic function of \(X\) admits the following expansion around \(t=0\),

\[ \varphi_X(t) = 1 + 3 i t - (|t|+i t)\,\sqrt{2\pi|t|} + O(t^{2}). \]

Based on this, prove the generalized CLT for this particular distribution \(X\) which states that \(Z_n = c\, n^{1/3} (\bar{X}_n - \mathbb{E}[X])\) in the limit \(n\rightarrow\infty\) converges in distribution, with a to-be-determined choice of overall constant \(c\), to a limiting random variable \(\mathcal{S}\) with characteristic function

\[ \varphi_{\mathcal{S}}(t) = \exp\big(-(|t|+it)\sqrt{|t|}\big). \]

(15 pts)

YOUR ANSWER HERE

(c) The random variable \(\mathcal{S}\) has a stable Lévy distribution with index \(\alpha = 3/2\) and skewness \(\beta = 1\). Its probability density function \(f_{\mathcal{S}}(x)\) does not admit a simple expression, but can be accessed numerically using SciPy’s scipy.stats.levy_stable.pdf(x,1.5,1.0). Verify numerically that the generalized CLT of part (b) holds by comparing an appropriate histogram to this PDF. (10 pts)

from scipy.stats import levy_stable

# YOUR CODE HERE
raise NotImplementedError()

2.3. Joint probability density functions and sampling the normal distribution#

(30 Points)

Let \(\Phi\) be a uniform random variable on \((0,2\pi)\) and \(R\) an independent continuous random variable with probability density function \(f_R(r) = r\,e^{-r^2/2}\) for \(r>0\). Set \(X = R \cos \Phi\) and \(Y = R \sin \Phi\). This is called the Box-Muller transform.

(a) Since \(\Phi\) and \(R\) are independent, the joint probability density of \(\Phi\) and \(R\) is \(f_{\Phi,R}(\phi,r) = f_\Phi(\phi)f_R(r) = \frac{1}{2\pi}\, r\,e^{-r^2/2}\). Show by change of variables that \(X\) and \(Y\) are also independent and both distributed as a standard normal distribution \(\mathcal{N}\). (15 pts)

YOUR ANSWER HERE

(b) Write a function to sample a pair of independent normal random variables using the Box-Muller transform. Hint: to sample \(R\) you can use the inversion method of the first exercise. Produce a histogram to check the distribution of your normal variables. (15 pts)

def random_normal_pair():
    '''Return two independent normal random variables.'''
    # YOUR CODE HERE
    raise NotImplementedError()
    return x, y

# Plotting
# YOUR CODE HERE
raise NotImplementedError()