# The Categorical distribution¶

We are now going to generalize the six-sided die experiment. A Categorical random variable is used to model an experiment with taking $$K$$ different possibilities coded, for example, $$1, 2,\dots,K$$, each with a different probability. We can write:

$\begin{split} X = \begin{cases} 1,&\;\text{with probability}\;p_1,\\ 2,&\;\text{with probability}\;p_2,\\ \vdots&\\ K,&\;\text{with probability}\;p_K. \end{cases} \end{split}$

Of course, we can also write:

$p(X=x) = p_x.$

Another way, we can write this is:

$X\sim \text{Categorical}(p_1,\dots,p_K),$

the random variable $$X$$ follows a Categorical distribution with $$K$$ possibilities each with probability $$p_1, p_2$$ to $$p_K$$.

The six-sided, fair, die is a particular example of a Categorical. This one in particular:

$p(X=x) = \frac{1}{6},$

if $$x=1,2,\dots,6$$.

## A specific example¶

Let’s now make a specific choice for the probabilities, make a Categorical, and sample from it. We are going to play with this one which has four possibilities:

$X \sim \text{Categorical}(0.1, 0.3, 0.4, 0.2).$
import matplotlib.pyplot as plt
%matplotlib inline
import seaborn as sns
sns.set(rc={"figure.dpi":100, 'savefig.dpi':300})
sns.set_context('notebook')
sns.set_style("ticks")
from IPython.display import set_matplotlib_formats
set_matplotlib_formats('retina', 'svg')
import numpy as np
import scipy.stats as st

# The probabilities:
ps = [0.1, 0.3, 0.4, 0.2] # this has to sum to 1
# And here are the corresponding values:
xs = np.array([1, 2, 3, 4])
# Here is how you can define a categorical rv:
X = st.rv_discrete(name='Custom Categorical', values=(xs, ps))


You can evaluate the PMF anywhere you want:

X.pmf(2)

0.3

X.pmf(3)

0.4


And you can sample from it like this:

X.rvs(size=10)

array([3, 3, 3, 4, 3, 4, 4, 4, 3, 2])


Let’s plot the PMF:

fig, ax = plt.subplots(dpi=150)
ax.bar(xs, X.pmf(xs))
ax.set_xlabel('$x$')
ax.set_ylabel('$p(x)$');


Okay. Now let’s find the probability that $$X$$ takes the value $$2$$ or $$4$$. It is:

$p(X=2\;\text{or}X=4) = p(X=2) + p(X=4).$

So:

X.pmf(2) + X.pmf(4)

0.5


### Questions¶

• Rerun all code segements above for the Categorical $$X\sim \operatorname{Categorical}(0.1, 0.1, 0.4, 0.2, 0.2)$$ taking values $$1, 2, 3, 4$$ and $$5$$.