Quantiles of the standard Normal#
Quantiles are a great way to summarize a random variable with a few numbers. Let’s start with the standard Normal. Take:
The definition of is this:
The \(q\) quantile of \(Z\) is the value \(z_q\) such that the probability of \(Z\) being less that \(z_q\) is \(q\).
Mathematically, you want to find a value \(z_q\)
The median of the standard Normal#
For example, the \(0.5\) quantile \(z_{0.5}\) satisfies the property:
This is known as the median of \(Z\). In words, 50% of the probability of \(Z\) to the left of the median. For the standard normal, we have because of the symmetry of the PDF about zero that:
Of course, scipy.stats
knows about the median:
Z = st.norm()
Z.median()
0.0
Other quantiles of the standard Normal#
Another interesting quantile is \(z_{0.025}\). So, \(z_{0.025}\) marks the point below which \(Z\) lies with probability \(2.5\)%. This is not trivial to find though. You really need to solve the nonlinear equation:
But scipy.stats
can do this for you using the function Z.ppf()
:
z_025 = Z.ppf(0.025)
print(f'z_025 = {z_025:1.2f}')
z_025 = -1.96
Let’s verify that this is indeed giving me the \(0.025\) quantile. If I plug it in the CDF I should get \(0.025\):
print(f'Phi(z_025) = {Z.cdf(z_025):1.3f}')
Phi(z_025) = 0.025
Okay, it looks good!
Let’s also find \(z_{0.975}\):
z_975 = Z.ppf(0.975)
print(f'z_975 = {z_975:1.2f}')
z_975 = 1.96
Nice! This is just \(-z_{0.025}\). We could have guessed it!
Credible intervals#
Alright, these two quantiles are particularly important. Why, because the probability that \(Z\) is between them is 95%! We we could write:
\(Z\) is between -1.96 and +1.96 with probability 95%.
This is a very nice summary of the uncertainty in \(Z\). This is called the 95% (central) credible interval of \(Z\).
Now if you are like me, you would simplify this a bit more by writing:
\(Z\) is between -2 and +2 with probability (approximately) 95%.
Who wants to remember that 1.96…
Let’s visualize the 95% (central) credible interval by shaded the PDF:
Let’s end by finding the 99.9% credible interval of \(Z\). We need the following quantiles:
\(z_{0.001}\):
z_001 = Z.ppf(0.001)
print(f'z_001 = {z_001:1.2f}')
z_001 = -3.09
\(z_{0.999}\):
z_999 = Z.ppf(0.999)
print(f'z_999 = {z_999:1.2f}')
z_999 = 3.09
So, we can now write:
\(Z\) is between -3.09 and 3.09 with probability 99.8%.
Or the more practical:
\(Z\) is between -3 and 3 with probability (about) 99.8%.
How can I think about this intuitively? Well, if you sample many many times from \(Z\) approximately 2 out of a 1000 samples will be outside of the interval \([-3, 3]\). Let’s test this computationally:
# Take 1,000,000 samples
zs = Z.rvs(size=1_000_000)
# Count the number of zs that are outside the range
idx = (zs < -3) | (zs > 3)
# How many samples out of 1,000?
zs[idx].size / 1_000_000 * 1_000
2.626
Questions#
Modify the code above to find the 99.99% central credible interval of \(Z\).