The Categorical distribution
Contents
The Categorical distribution¶
We are now going to generalize the six-sided die experiment. A Categorical random variable is used to model an experiment with taking \(K\) different possibilities coded, for example, \(1, 2,\dots,K\), each with a different probability. We can write:
Of course, we can also write:
Another way, we can write this is:
which we read as:
the random variable \(X\) follows a Categorical distribution with \(K\) possibilities each with probability \(p_1, p_2\) to \(p_K\).
The six-sided, fair, die is a particular example of a Categorical. This one in particular:
if \(x=1,2,\dots,6\).
A specific example¶
Let’s now make a specific choice for the probabilities, make a Categorical, and sample from it. We are going to play with this one which has four possibilities:
import matplotlib.pyplot as plt
%matplotlib inline
import seaborn as sns
sns.set(rc={"figure.dpi":100, 'savefig.dpi':300})
sns.set_context('notebook')
sns.set_style("ticks")
from IPython.display import set_matplotlib_formats
set_matplotlib_formats('retina', 'svg')
import numpy as np
import scipy.stats as st
# The probabilities:
ps = [0.1, 0.3, 0.4, 0.2] # this has to sum to 1
# And here are the corresponding values:
xs = np.array([1, 2, 3, 4])
# Here is how you can define a categorical rv:
X = st.rv_discrete(name='Custom Categorical', values=(xs, ps))
You can evaluate the PMF anywhere you want:
X.pmf(2)
0.3
X.pmf(3)
0.4
And you can sample from it like this:
X.rvs(size=10)
array([3, 3, 3, 4, 3, 4, 4, 4, 3, 2])
Let’s plot the PMF:
fig, ax = plt.subplots(dpi=150)
ax.bar(xs, X.pmf(xs))
ax.set_xlabel('$x$')
ax.set_ylabel('$p(x)$');
Okay. Now let’s find the probability that \(X\) takes the value \(2\) or \(4\). It is:
So:
X.pmf(2) + X.pmf(4)
0.5
Questions¶
Rerun all code segements above for the Categorical \(X\sim \operatorname{Categorical}(0.1, 0.1, 0.4, 0.2, 0.2)\) taking values \(1, 2, 3, 4\) and \(5\).