Quantiles of the standard Normal¶

Quantiles are a great way to summarize a random variable with a few numbers. Let’s start with the standard Normal. Take:

\[ Z\sim N(0,1). \]

The definition of is this:

The \(q\) quantile of \(Z\) is the value \(z_q\) such that the probability of \(Z\) being less that \(z_q\) is \(q\).

Mathematically, you want to find a value \(z_q\)

\[ \Phi(z_q) = q. \]

The median of the standard Normal¶

For example, the \(0.5\) quantile \(z_{0.5}\) satisfies the property:

\[ \Phi(z_{0.5}) = 0.5. \]

This is known as the median of \(Z\). In words, 50% of the probability of \(Z\) to the left of the median. For the standard normal, we have because of the symmetry of the PDF about zero that:

\[ z_{0.5} = 0. \]

Of course, scipy.stats knows about the median:

import matplotlib.pyplot as plt
%matplotlib inline
import seaborn as sns
sns.set(rc={"figure.dpi":100, 'savefig.dpi':300})
sns.set_context('notebook')
sns.set_style("ticks")
from IPython.display import set_matplotlib_formats
set_matplotlib_formats('retina', 'svg')
import numpy as np
import scipy.stats as st

Z = st.norm()
Z.median()

0.0

Other quantiles of the standard Normal¶

Another interesting quantile is \(z_{0.025}\). So, \(z_{0.025}\) marks the point below which \(Z\) lies with probability \(2.5\)%. This is not trivial to find though. You really need to solve the nonlinear equation:

\[ \Phi(z_{0.025}) = 0.025. \]

But scipy.stats can do this for you using the function Z.ppf():

z_025 = Z.ppf(0.025)
print('z_025 = {0:1.2f}'.format(z_025))

z_025 = -1.96

Let’s verify that this is indeed giving me the \(0.025\) quantile. If I plug it in the CDF I should get \(0.025\):

print('Phi(z_025) = {0:1.3f}'.format(Z.cdf(z_025)))

Phi(z_025) = 0.025

Okay, it looks good!

Let’s also find \(z_{0.975}\):

z_975 = Z.ppf(0.975)
print('z_975 = {0:1.2f}'.format(z_975))

z_975 = 1.96

Nice! This is just \(-z_{0.025}\). We could have guessed it!

Credible intervals¶

Alright, these two quantiles are particularly important. Why, because the probability that \(Z\) is between them is 95%! We we could write:

\(Z\) is between -1.96 and +1.96 with probability 95%.

This is a very nice summary of the uncertainty in \(Z\). This is called the 95% (central) credible interval of \(Z\).

Now if you are like me, you would simplify this a bit more by writing:

\(Z\) is between -2 and +2 with probability (approximately) 95%.

Who wants to remember that 1.96…

Let’s visualize the 95% (central) credible interval by shaded the PDF:

fig, ax = plt.subplots()
zs = np.linspace(-6.0, 6.0, 200)
Phis = Z.pdf(zs)
ax.plot(zs, Phis)
idx = (zs >= -2) & (zs <= 2)
ax.fill_between(zs[idx], 0.0, Phis[idx], color='r', alpha=0.5, label='95% credible interval')
ax.set_xlabel('$z$')
ax.set_ylabel('$\phi(z)$')
plt.legend(loc='best');

../_images/standard-normal-quantiles_10_0.svg

Let’s end by finding the 99.9% credible interval of \(Z\). We need the following quantiles:

\(z_{0.001}\):

z_001 = Z.ppf(0.001)
print('z_001 = {0:1.2f}'.format(z_001))

z_001 = -3.09

\(z_{0.999}\):

z_999 = Z.ppf(0.999)
print('z_999 = {0:1.2f}'.format(z_999))

z_999 = 3.09

So, we can now write:

\(Z\) is between -3.09 and 3.09 with probability 99.8%.

Or the more practical:

\(Z\) is between -3 and 3 with probability (about) 99.8%.

How can I think about this intuitively? Well, if you sample many many times from \(Z\) approximately 2 out of a 1000 samples will be outside of the interval \([-3, 3]\). Let’s test this computationally:

# Take 1,000,000 samples
zs = Z.rvs(size=1_000_000)
# Count the number of zs that are outside the range
idx = (zs < -3) | (zs > 3)
# How many samples out of 1,000?
zs[idx].size / 1_000_000 * 1_000

2.716

Questions¶

Modify the code above to find the 99.99% central credible interval of \(Z\).

Introduction to Data Science for Mechanical Engineers (Lecture Book)

Quantiles of the standard Normal

Contents

Quantiles of the standard Normal¶

The median of the standard Normal¶

Other quantiles of the standard Normal¶

Credible intervals¶

Questions¶