# Quantiles of the standard Normal¶

Quantiles are a great way to summarize a random variable with a few numbers. Let’s start with the standard Normal. Take:

$Z\sim N(0,1).$

The definition of is this:

The $$q$$ quantile of $$Z$$ is the value $$z_q$$ such that the probability of $$Z$$ being less that $$z_q$$ is $$q$$.

Mathematically, you want to find a value $$z_q$$

$\Phi(z_q) = q.$

## The median of the standard Normal¶

For example, the $$0.5$$ quantile $$z_{0.5}$$ satisfies the property:

$\Phi(z_{0.5}) = 0.5.$

This is known as the median of $$Z$$. In words, 50% of the probability of $$Z$$ to the left of the median. For the standard normal, we have because of the symmetry of the PDF about zero that:

$z_{0.5} = 0.$

Of course, scipy.stats knows about the median:

import matplotlib.pyplot as plt
%matplotlib inline
import seaborn as sns
sns.set(rc={"figure.dpi":100, 'savefig.dpi':300})
sns.set_context('notebook')
sns.set_style("ticks")
from IPython.display import set_matplotlib_formats
set_matplotlib_formats('retina', 'svg')
import numpy as np
import scipy.stats as st

Z = st.norm()
Z.median()

0.0


## Other quantiles of the standard Normal¶

Another interesting quantile is $$z_{0.025}$$. So, $$z_{0.025}$$ marks the point below which $$Z$$ lies with probability $$2.5$$%. This is not trivial to find though. You really need to solve the nonlinear equation:

$\Phi(z_{0.025}) = 0.025.$

But scipy.stats can do this for you using the function Z.ppf():

z_025 = Z.ppf(0.025)
print('z_025 = {0:1.2f}'.format(z_025))

z_025 = -1.96


Let’s verify that this is indeed giving me the $$0.025$$ quantile. If I plug it in the CDF I should get $$0.025$$:

print('Phi(z_025) = {0:1.3f}'.format(Z.cdf(z_025)))

Phi(z_025) = 0.025


Okay, it looks good!

Let’s also find $$z_{0.975}$$:

z_975 = Z.ppf(0.975)
print('z_975 = {0:1.2f}'.format(z_975))

z_975 = 1.96


Nice! This is just $$-z_{0.025}$$. We could have guessed it!

## Credible intervals¶

Alright, these two quantiles are particularly important. Why, because the probability that $$Z$$ is between them is 95%! We we could write:

$$Z$$ is between -1.96 and +1.96 with probability 95%.

This is a very nice summary of the uncertainty in $$Z$$. This is called the 95% (central) credible interval of $$Z$$.

Now if you are like me, you would simplify this a bit more by writing:

$$Z$$ is between -2 and +2 with probability (approximately) 95%.

Who wants to remember that 1.96…

Let’s visualize the 95% (central) credible interval by shaded the PDF:

fig, ax = plt.subplots()
zs = np.linspace(-6.0, 6.0, 200)
Phis = Z.pdf(zs)
ax.plot(zs, Phis)
idx = (zs >= -2) & (zs <= 2)
ax.fill_between(zs[idx], 0.0, Phis[idx], color='r', alpha=0.5, label='95% credible interval')
ax.set_xlabel('$z$')
ax.set_ylabel('$\phi(z)$')
plt.legend(loc='best'); Let’s end by finding the 99.9% credible interval of $$Z$$. We need the following quantiles:

• $$z_{0.001}$$:

z_001 = Z.ppf(0.001)
print('z_001 = {0:1.2f}'.format(z_001))

z_001 = -3.09

• $$z_{0.999}$$:

z_999 = Z.ppf(0.999)
print('z_999 = {0:1.2f}'.format(z_999))

z_999 = 3.09


So, we can now write:

$$Z$$ is between -3.09 and 3.09 with probability 99.8%.

Or the more practical:

$$Z$$ is between -3 and 3 with probability (about) 99.8%.

How can I think about this intuitively? Well, if you sample many many times from $$Z$$ approximately 2 out of a 1000 samples will be outside of the interval $$[-3, 3]$$. Let’s test this computationally:

# Take 1,000,000 samples
zs = Z.rvs(size=1_000_000)
# Count the number of zs that are outside the range
idx = (zs < -3) | (zs > 3)
# How many samples out of 1,000?
zs[idx].size / 1_000_000 * 1_000

2.716


## Questions¶

• Modify the code above to find the 99.99% central credible interval of $$Z$$.