Examples of variances of random variables#
Let’s revisit some of the distributions we encountered in the earlier and calculate their
variances.
We will do it both analytically, and using scipy.stats
.
Example: Variance of a Bernoulli random variable#
Take a Bernoulli random variable:
We have already found that this expectation is:
To find the variance we are going to use Variance Property 3. For this we need to find the expectation of the square:
So, we have:
And here is how we can do it using scipy.stats
:
Show code cell source
import matplotlib.pyplot as plt
%matplotlib inline
import seaborn as sns
sns.set(rc={"figure.dpi":100, 'savefig.dpi':300})
sns.set_context('notebook')
sns.set_style("ticks")
from IPython.display import set_matplotlib_formats
set_matplotlib_formats('retina', 'svg')
import numpy as np
import scipy.stats as st
theta = 0.7
X = st.bernoulli(theta)
Now that we have made the random variable we can get its expectation by X.var()
:
print('V[X] = {0:1.2f}'.format(X.var()))
print('Compare to theta * (1 - theta) = {0:1.2f}'.format(theta * (1 - theta)))
V[X] = 0.21
Compare to theta * (1 - theta) = 0.21
The standard deviation is just the square root of the variance:
In scipy.stats
you can get it by X.std()
:
print('std of X = {0:1.2f}'.format(X.std()))
std of X = 0.46
Example: Variance of a uniform random variable#
Take
Remember that the PDF is:
when
We have already found the expectation and it was given by the mid-point between
To find the variance, we first need to find the expectation of the square:
You can simplify this even more, but we won’t bother. Now, put everything together:
If you remember your basics mechanics course, this is the second area moment of inertia of a beam about its center of mass. This is not an accident. Mathematically, the variance and the second area moments are exactly the same integrals.
Let’s do it also on scipy.stats
:
a = 0
b = 5
X = st.uniform(a, b)
print('V[X] = {0:1.2f}'.format(X.var()))
print('Compare to theoretical answer = {0:1.2f}'.format((b - a) ** 2 / 12))
V[X] = 2.08
Compare to theoretical answer = 2.08
Example: Variance of a Categorical random variable#
Take a Categorical random variable:
The expectation is:
Again, we are going to invoke Variance Property 3. We need the expectation of the square:
So, we have:
Here is how we can find it with Python:
import numpy as np
# The values X can take
xs = np.arange(4)
print('X values: ', xs)
# The probability for each value
ps = np.array([0.1, 0.3, 0.4, 0.2])
print('X probabilities: ', ps)
# And the expectation in a single line
E_X = np.sum(xs * ps)
# The expectation of the square
E_X2 = np.sum(xs ** 2 * ps)
# The variance
V_X = E_X2 - E_X ** 2
print('V[X] = {0:1.2f}'.format(V_X))
X values: [0 1 2 3]
X probabilities: [0.1 0.3 0.4 0.2]
V[X] = 0.81
Alternatively, we could use scipy.stats
:
X = st.rv_discrete(name='X', values=(xs, ps))
print('V[X] = {0:1.2f}'.format(X.var()))
V[X] = 0.81
The standard deviation is:
print('std of X = {0:1.2f}'.format(X.std()))
std of X = 0.90
Let’s now make a plot. I am going to plot the the PMF of
the expected value minus two standard deviations,
the expected value plus two standard deviations.
Let’s see what we get:
Show code cell source
fig, ax = plt.subplots()
ax.vlines(xs, 0, X.pmf(xs), label='PMF of $X$')
mu = X.expect()
std = X.std()
low = mu - 2 * std
up = mu + 2 * std
ax.plot(mu, 0, 'ro', label='$\mu = \mathbf{E}[X]$')
ax.plot(low, 0, 'gx', label='$\mu - 2\sigma$')
ax.plot(up, 0, 'md', label='$\mu + 2\sigma$')
ax.set_xlabel('$x$')
ax.set_ylabel('$p(x)$')
ax.set_title('Categorical$(0.1, 0.3, 0.4, 0.2)$'.format(theta))
plt.legend(loc='upper left');
We see that, in this case, going two standard deviations below the mean and two standard deviations above the mean captures pretty much all the values.
Example: Variance of a Binomial random variable#
Take a Binomial random variable:
It is not very easy to find the variance of this one. But it is given by the following formula:
If you notice, it is basically
Here is how we can get it with scipy.stats
:
n = 5
theta = 0.6
X = st.binom(n, theta)
print('E[X] = {0:1.2f}'.format(X.var()))
print('Compare to n * theta = {0:1.2f}'.format(n * theta * (1 - theta)))
E[X] = 1.20
Compare to n * theta = 1.20
Let’s plot the same things we plotted for the categorical:
Show code cell source
fig, ax = plt.subplots()
xs = np.arange(n+1)
ax.vlines(xs, 0, X.pmf(xs), label='PMF of $X$')
mu = X.expect()
std = X.std()
low = mu - 2 * std
up = mu + 2 * std
ax.plot(mu, 0, 'ro', label='$\mu = \mathbf{E}[X]$')
ax.plot(low, 0, 'gx', label='$\mu - 2\sigma$')
ax.plot(up, 0, 'md', label='$\mu + 2\sigma$')
ax.set_xlabel('$x$')
ax.set_ylabel('$p(x)$')
ax.set_title(r'Binomial$(n={0:d}, \theta={1:1.2f})$'.format(n, theta))
plt.legend(loc='upper left');
Questions#
Rerun the case of the Binomial with
. Does the shape of the PMF you get look familiar?
Example: Variance of a Poisson random variable#
Take Poisson random variable:
Finding this variance is also non-trival. But it is:
Wait a second!!! Didn’t we say that the variance has the square units of
Let’s also do it in scipy.stats
:
lam = 2.0
X = st.poisson(lam)
print('E[X] = {0:1.2f}'.format(X.var()))
E[X] = 2.00
And let’s visualize everything together like before:
Show code cell source
fig, ax = plt.subplots()
xs = np.arange(X.ppf(0.9999)) # I will explain this later
ax.vlines(xs, 0, X.pmf(xs), label='PMF of $X$')
mu = X.expect()
std = X.std()
low = mu - 2 * std
up = mu + 2 * std
ax.plot(mu, 0, 'ro', label='$\mu = \mathbf{E}[X]$')
ax.plot(low, 0, 'gx', label='$\mu - 2\sigma$')
ax.plot(up, 0, 'md', label='$\mu + 2\sigma$')
ax.set_xlabel('$x$')
ax.set_ylabel('$p(x)$')
ax.set_title(r'Poisson$(\lambda={0:1.2f})$'.format(lam))
plt.legend(loc='upper right');
Question#
Rerun the case for the Poisson with a rate parameter
. Does the shape look familiar?