Two uncorrelated random variables are not necessarily independent

Two uncorrelated random variables are not necessarily independent

We have seen that if two random variables \(X\) and \(Y\) are independent, then their covariance is zero,

\[ \mathbf{C}[X,Y] = 0, \]

and therefore their correlation coefficient is also zero:

\[ \rho(X,Y) = 0. \]

Does the reverse hold? Namely, if you find that the correlation between two random variables is zero, does this imply that they are independent? The answer to this question is a loud NO. We will show that it does not hold through a counter example.

Take these two independent random variables:

\[ X \sim N(0, 1), \]

and

\[ Z \sim N(0, 1). \]

Then define this new random variable \(Y\) by:

\[ Y = X^2 + 0.1 Z. \]

Since there is a functional relationship between \(X\) and \(Y\), they are obviously not independent. But let’s generate some data from them and estimate the correlation:

import matplotlib.pyplot as plt
%matplotlib inline
import seaborn as sns
sns.set(rc={"figure.dpi":100, 'savefig.dpi':300})
sns.set_context('notebook')
sns.set_style("ticks")
from IPython.display import set_matplotlib_formats
set_matplotlib_formats('retina', 'svg')
import numpy as np
import scipy.stats as st
xdata = np.random.randn(10000)
zdata = np.random.randn(10000)
ydata = xdata ** 2 + 0.2 * zdata

It’s instructive to look at the scatter plot:

fig, ax = plt.subplots()
ax.scatter(xdata, ydata)
ax.set_xlabel('$x$')
ax.set_ylabel('$y$');
../_images/uncorrelated-does-not-imply-independent_4_0.svg

Well, it’s obvious that they are not independent. Let’s see what the correlation coefficient is:

rho = np.corrcoef(xdata, ydata)
print('rho(X, Y) = {0:1.2f}'.format(rho[0, 1]))
rho(X, Y) = 0.03

Very close to zero. So, \(X\) and \(Y\) are uncorrelated… Rememeber this please! Do the scatter plots and use your common sense. Do not just rely on a number to make decisions.

After you see the scatter plot like this, you get suspicous. You start thinking that there may be a correlation between the square of \(X\) and \(Y\). Let’s estimate the correlation of \(X^2\) and \(Y\) to see what it turns out to be:

rho = np.corrcoef(xdata ** 2, ydata)
print('rho(X^2, Y) = {0:1.2f}'.format(rho[0, 1]))
rho(X^2, Y) = 0.99

Almost one! (It is actually exactly one).