The probability mass function
The probability mass function¶
Say that \(X\) is the result of a coin toss experiment. As we have already discussed with can code the result of the experiment with natural numbers. Here we use 0 for “heads” and 1 for “tails.” If we have a fair coin, we know that \(X\) takes the value \(0\) with probability \(0.5\) and the value \(1\) also with probability \(0.5\). We can write this succinctly as:
But there is another way to write this that is often more convenient. Here it is:
Now, we can use the obvious rule of probability to get:
You can now define the following function:
This is a function associated with the discrete random variable \(X\) (that is why \(X\) is used as a subscript in \(f_X\)). It takes any possible value \(x\) for \(X\) and responds with the probability of \(X\) taking that value. It has a special name. It is called the probability mass function or PMF of the discrete random variable \(X\).
Note
We have silently introduced the following standard notational conventions:
Use upper case letters to represent random variables, like \(X, Y, Z\).
Use lower case letters to represent the values of random variables, like \(x, y, z\) can take.
The notation \(f_X(x)\) for the PMF is a bit of an overkill in my opinion. It is commonly used in mathematical textbooks where rigor is valued above everything. In practice, now on wants to write or type this many symbols. So, it is a common convention to write \(p(x)\) instead of \(f_X(x)\) when there is no ambiguity. Symbolically, we follow this convention:
In other words, when you see \(p(x)\) we are talking about the PMF of a random variable \(\text{capitalize}(x) = X\) evaluated at \(x\). Similarly, if you see \(p(y)\) we are referring to the PMF of the random variable \(Y\) evaluated at \(y\). However, it meaningless to write \(p(0)\). What does it mean? To which random variable do you refer? \(X\) or \(Y\)? So, in case you want to plug in a specific value you should write \(p(X=0)\) or \(p(Y=0)\), etc.
Note
In everything I write below there is supposed to be some background information \(I\). So, I should be really writing \(p(X=0|I), p(x|I)\), etc. However, since \(I\) remains fixed I decided to drop it from the notation.