# The Categorical distribution

## Contents

# The Categorical distribution¶

We are now going to generalize the six-sided die experiment. A Categorical random variable is used to model an experiment with taking \(K\) different possibilities coded, for example, \(1, 2,\dots,K\), each with a different probability. We can write:

Of course, we can also write:

Another way, we can write this is:

which we read as:

the random variable \(X\) follows a Categorical distribution with \(K\) possibilities each with probability \(p_1, p_2\) to \(p_K\).

The six-sided, fair, die is a particular example of a Categorical. This one in particular:

if \(x=1,2,\dots,6\).

## A specific example¶

Let’s now make a specific choice for the probabilities, make a Categorical, and sample from it. We are going to play with this one which has four possibilities:

```
import matplotlib.pyplot as plt
%matplotlib inline
import seaborn as sns
sns.set(rc={"figure.dpi":100, 'savefig.dpi':300})
sns.set_context('notebook')
sns.set_style("ticks")
from IPython.display import set_matplotlib_formats
set_matplotlib_formats('retina', 'svg')
import numpy as np
import scipy.stats as st
```

```
# The probabilities:
ps = [0.1, 0.3, 0.4, 0.2] # this has to sum to 1
# And here are the corresponding values:
xs = np.array([1, 2, 3, 4])
# Here is how you can define a categorical rv:
X = st.rv_discrete(name='Custom Categorical', values=(xs, ps))
```

You can evaluate the PMF anywhere you want:

```
X.pmf(2)
```

```
0.3
```

```
X.pmf(3)
```

```
0.4
```

And you can sample from it like this:

```
X.rvs(size=10)
```

```
array([3, 3, 3, 4, 3, 4, 4, 4, 3, 2])
```

Let’s plot the PMF:

```
fig, ax = plt.subplots(dpi=150)
ax.bar(xs, X.pmf(xs))
ax.set_xlabel('$x$')
ax.set_ylabel('$p(x)$');
```

Okay. Now let’s find the probability that \(X\) takes the value \(2\) or \(4\). It is:

So:

```
X.pmf(2) + X.pmf(4)
```

```
0.5
```

### Questions¶

Rerun all code segements above for the Categorical \(X\sim \operatorname{Categorical}(0.1, 0.1, 0.4, 0.2, 0.2)\) taking values \(1, 2, 3, 4\) and \(5\).