Examples of variances of random variables#

Let’s revisit some of the distributions we encountered in the earlier and calculate their variances. We will do it both analytically, and using scipy.stats.

Example: Variance of a Bernoulli random variable#

Take a Bernoulli random variable:

XBernoulli(θ).

We have already found that this expectation is:

μ=E[X]=θ.

To find the variance we are going to use Variance Property 3. For this we need to find the expectation of the square:

E[X2]=xx2p(x)=02p(X=0)+12p(X=1)=θ.

So, we have:

V[X]=E[X2]μ2=θθ2=θ(1θ).

And here is how we can do it using scipy.stats:

Hide code cell source
import matplotlib.pyplot as plt
%matplotlib inline
import seaborn as sns
sns.set(rc={"figure.dpi":100, 'savefig.dpi':300})
sns.set_context('notebook')
sns.set_style("ticks")
from IPython.display import set_matplotlib_formats
set_matplotlib_formats('retina', 'svg')
import numpy as np
import scipy.stats as st
theta = 0.7
X = st.bernoulli(theta)

Now that we have made the random variable we can get its expectation by X.var():

print('V[X] = {0:1.2f}'.format(X.var()))
print('Compare to theta * (1 - theta) = {0:1.2f}'.format(theta * (1 - theta)))
V[X] = 0.21
Compare to theta * (1 - theta) = 0.21

The standard deviation is just the square root of the variance:

σ=θ(1θ).

In scipy.stats you can get it by X.std():

print('std of X = {0:1.2f}'.format(X.std()))
std of X = 0.46

Example: Variance of a uniform random variable#

Take

XU([a,b]).

Remember that the PDF is:

p(x)=1ba,

when x is in [a,b] and zero otherwise.

We have already found the expectation and it was given by the mid-point between a and b:

μ=E[X]=a+b2.

To find the variance, we first need to find the expectation of the square:

E[X2]=x2p(x)dx=abx21badx=1bax33|ab=1bab3a33=b3a33(ba)

You can simplify this even more, but we won’t bother. Now, put everything together:

V[X]=E[X2]μ2=b3a33(ba)(a+b)24=4(b3a3)43(ba)3(ba)(a+b)23(ba)4=4(b3a3)3(ba)(a+b)212(ba)=4b34a33(ba)(a2+2ab+b2)12(ba)=4b34a33a2b6ab23b3+3a3+6a2b+3ab212(ba)=b3a33a2b6ab2+6a2b+3ab212(ba)=(ba)312(ba)=(ba)212.

If you remember your basics mechanics course, this is the second area moment of inertia of a beam about its center of mass. This is not an accident. Mathematically, the variance and the second area moments are exactly the same integrals.

Let’s do it also on scipy.stats:

a = 0
b = 5
X = st.uniform(a, b)
print('V[X] = {0:1.2f}'.format(X.var()))
print('Compare to theoretical answer = {0:1.2f}'.format((b - a) ** 2 / 12))
V[X] = 2.08
Compare to theoretical answer = 2.08

Example: Variance of a Categorical random variable#

Take a Categorical random variable:

XCategorical(0.1,0.3,0.4,0.2).

The expectation is:

μ=E[X]=1.7.

Again, we are going to invoke Variance Property 3. We need the expectation of the square:

E[X2]=xx2p(x)=02p(X=0)+12p(X=1)+22p(X=2)+32p(X=3)=00.1+10.3+40.4+90.2=3.7.

So, we have:

V[X]=E[X2]μ2=3.71.72=0.81.

Here is how we can find it with Python:

import numpy as np
# The values X can take
xs = np.arange(4)
print('X values: ', xs)
# The probability for each value
ps = np.array([0.1, 0.3, 0.4, 0.2])
print('X probabilities: ', ps)
# And the expectation in a single line
E_X = np.sum(xs * ps)
# The expectation of the square
E_X2 = np.sum(xs ** 2 * ps)
# The variance
V_X = E_X2 - E_X ** 2
print('V[X] = {0:1.2f}'.format(V_X))
X values:  [0 1 2 3]
X probabilities:  [0.1 0.3 0.4 0.2]
V[X] = 0.81

Alternatively, we could use scipy.stats:

X = st.rv_discrete(name='X', values=(xs, ps))
print('V[X] = {0:1.2f}'.format(X.var()))
V[X] = 0.81

The standard deviation is:

print('std of X = {0:1.2f}'.format(X.std()))
std of X = 0.90

Let’s now make a plot. I am going to plot the the PMF of X and I am going to mark the position of the expected value along with:

  • the expected value minus two standard deviations,

  • the expected value plus two standard deviations.

Let’s see what we get:

Hide code cell source
fig, ax = plt.subplots()
ax.vlines(xs, 0, X.pmf(xs), label='PMF of $X$')
mu = X.expect()
std = X.std()
low = mu - 2 * std
up = mu + 2 * std
ax.plot(mu, 0, 'ro', label='$\mu = \mathbf{E}[X]$')
ax.plot(low, 0, 'gx', label='$\mu - 2\sigma$')
ax.plot(up, 0, 'md', label='$\mu + 2\sigma$')
ax.set_xlabel('$x$')
ax.set_ylabel('$p(x)$')
ax.set_title('Categorical$(0.1, 0.3, 0.4, 0.2)$'.format(theta))
plt.legend(loc='upper left');
../_images/bbfd1a4e318b868961446871c3450f3e04ac6a1fe60d90eac9fe5783444ddadc.svg

We see that, in this case, going two standard deviations below the mean and two standard deviations above the mean captures pretty much all the values.

Example: Variance of a Binomial random variable#

Take a Binomial random variable:

XBinomial(n,θ).

It is not very easy to find the variance of this one. But it is given by the following formula:

V[X]=nθ(1θ).

If you notice, it is basically n times the variance of the Bernoulli. This is not accident. The Binomial is actually the sum of n independent Bernoulli’s. But we do not know the mathematics to deal with this yet.

Here is how we can get it with scipy.stats:

n = 5    
theta = 0.6
X = st.binom(n, theta)
print('E[X] = {0:1.2f}'.format(X.var()))
print('Compare to n * theta = {0:1.2f}'.format(n * theta * (1 - theta)))
E[X] = 1.20
Compare to n * theta = 1.20

Let’s plot the same things we plotted for the categorical:

Hide code cell source
fig, ax = plt.subplots()
xs = np.arange(n+1)
ax.vlines(xs, 0, X.pmf(xs), label='PMF of $X$')
mu = X.expect()
std = X.std()
low = mu - 2 * std
up = mu + 2 * std
ax.plot(mu, 0, 'ro', label='$\mu = \mathbf{E}[X]$')
ax.plot(low, 0, 'gx', label='$\mu - 2\sigma$')
ax.plot(up, 0, 'md', label='$\mu + 2\sigma$')
ax.set_xlabel('$x$')
ax.set_ylabel('$p(x)$')
ax.set_title(r'Binomial$(n={0:d}, \theta={1:1.2f})$'.format(n, theta))
plt.legend(loc='upper left');
../_images/7bf7bd5e5cdcd8f7c28cecda67115c33d2feddde9baa53f85998e27f7e2a72b3.svg

Questions#

  • Rerun the case of the Binomial with n=50. Does the shape of the PMF you get look familiar?

Example: Variance of a Poisson random variable#

Take Poisson random variable:

XPoisson(λ).

Finding this variance is also non-trival. But it is:

V[X]=λ.

Wait a second!!! Didn’t we say that the variance has the square units of X. If you paid attention the expectation of X was also λ. How is it even possible? Well, it is because X has no units… It’s just numbers counting events.

Let’s also do it in scipy.stats:

lam = 2.0
X = st.poisson(lam)
print('E[X] = {0:1.2f}'.format(X.var()))
E[X] = 2.00

And let’s visualize everything together like before:

Hide code cell source
fig, ax = plt.subplots()
xs = np.arange(X.ppf(0.9999)) # I will explain this later
ax.vlines(xs, 0, X.pmf(xs), label='PMF of $X$')
mu = X.expect()
std = X.std()
low = mu - 2 * std
up = mu + 2 * std
ax.plot(mu, 0, 'ro', label='$\mu = \mathbf{E}[X]$')
ax.plot(low, 0, 'gx', label='$\mu - 2\sigma$')
ax.plot(up, 0, 'md', label='$\mu + 2\sigma$')
ax.set_xlabel('$x$')
ax.set_ylabel('$p(x)$')
ax.set_title(r'Poisson$(\lambda={0:1.2f})$'.format(lam))
plt.legend(loc='upper right');
../_images/1875086d12d8c13be9c7c8ff467bbb1bdf89d9c16e357e57f844f2f45053e477.svg

Question#

  • Rerun the case for the Poisson with a rate parameter λ=50. Does the shape look familiar?