# Math Insight

### Introduction to probability distributions

Math 2241, Spring 2022
Name:
ID #:
Due date: April 6, 2022, 11:59 p.m.
Table/group #:
Group members:
Total points: 1
1. When you flip a fair coin, the two possible outcomes, heads ($H$) or tails ($T$), have equal probability.
$P(H) =$

$P(T) =$
1. If an experiment consisted flipping one coin and recording the number of heads observed, we could let the random variable $N$ be the observed number of heads. In this case, the two possible outcomes are the event $N=0$ of observing no heads and the vent $N=1$ of observing one head. To capture the possible values of $N$ and their probabilities, we can use a probability distribution function, or probability mass function, which we'll denote by $f_N(n)$. The function is defined by $$f_N(n) = P(N = n).$$ Note how $N$ and $n$ represent different objects. $N$ is the random variable and $n$ is a number (in this case $0$ or $1$) that could be the value of $N$. The definition of $f_N$ is shorthand for \begin{align*} f_N(0) &= P(N=0)\\ f_N(1) &= P(N=1)\\ f_N(2) &= P(N=2)\\ &\ldots \end{align*} Given that we are flipping just one coin, we can easily fill in the values.
$f_N(0) =$

$f_N(1) =$

$f_N(2) =$

$f_N(3) =$

In fact, since we can get at most one head, if $n>1$, we know that $f_N(n) =$
.

We can graph the probability distribution $f_N(n)$ using either a bar graph or a line graph.

Feedback from applet
Point heights:

Click the “toggle” button to switch between a bar graph and a line graph. The bar graph is more intuitive, as it is similar to a histogram of actual experiments (see below). When we want to graph a large number of events or plot multiple graphs together, the simpler line graph works better.

2. A similar experiment is to flip a coin twice and let $N$ be the observed number of heads. We define the probability distribution function (or probability mass function) $f_N$ in the same way: $$f_N(n) = P(N = n).$$ The values are only slightly trickier to fill in.
$f_N(0) =$

$f_N(1) =$

$f_N(2) =$

$f_N(3) =$

In fact, since we can get at most two heads, if $n>2$, we know that $f_N(n) =$
.

Graph the probability distribution $f_N(n)$.

Feedback from applet
Point heights:
3. A another experiment is to flip a coin four times and let $N$ be the observed number of heads. We define the probability distribution function (or probability mass function) $f_N$ in the same way: $$f_N(n) = P(N = n).$$ This time, it's a bit trickier to determine the probabilities.
$f_N(0) =$

$f_N(1) =$

$f_N(2) =$

$f_N(3) =$

$f_N(4) =$

Since we can get at most four heads, if $n>4$, we know that $f_N(n) =$
.

Graph the probability distribution $f_N(n)$.

Feedback from applet
Point heights:
4. If you flip a coin ten times and let $N$ be the number of heads, the probability distribution function $f_N(n)$ looks like the following bar graph.

What is the probability of getting getting seven heads out of the ten coin flips? $f_N(7) \approx$
What is the probability of getting just one head? $f_N(1) \approx$

This probability distribution of the number of heads is common probability distribution, so it is given a special name: the binomial distribution.

Obtaining the binomial distribution in R (Show)

2. A probability distribution of a random variable tell us the probability of getting any particular value. We can also repeatedly generate a random number and determine the frequency we obtain each value.
1. We can use R to randomly (actually pseudorandomly) flip coins for us so that we simulate coin flip experiments. You can enter rbinom(1,1,0.5) to random obtain a 0 or 1 with probability 0.5. If we let 1 correspond to $H$ and 0 correspond to $T$, we can view this as the result of a coin flip. Enter the command rbinom(1,1,0.5) over and over again to see that it randomly gives a 0 or a 1.

We can use the same function to flip a bunch of coins at once. To do our 10 coin-flip experiment, enter the command rbinom(10,1,0.5) and see that you will get a vector of 10 numbers, each which are 0 or 1. To count the number of heads, you can just add up all those 1's. In R, type sum(rbinom(10,1,0.5)) to perform 10 coin flips and add up the number of H's. You should get a number between 0 and 10. However, as shown above a 0 or 10, or even a 1 or 9, would be unlikely, so you'll usually get a number between 2 and 8.

Repeat this experiment 7 times by repeatedly running the command. Enter your results, separated by commas.

In fact, the binomial distribution, which is the distribution that rbinom is sampling from, already includes the process of adding up the number of heads. The second argument is the number of coin flips. Instead of typing the command sum(rbinom(10,1,0.5)), you can type the simpler command rbinom(1,10,0.5) to get the same result.

Now we can repeat the experiment of flipping 10 coins and counting the heads multiple times with a single command. To do this experiment 100 times, enter the command rbinom(100,10,0.5).

2. To use R to perform the four coin-flip experiment 10 times, what R command should you type?

Assign the result these experiments to the vector results by entering
results=_＿

Then, you can display a histogram of the results by typing the command:
hist(results, breaks=-0.5:4.5)
(The argument -0.5:4.5 is shorthand for the vector c(-0.5,0.5,1.5,2.5,3.5,4.5). It tells R that the breakpoints between intervals for the histogram are the half-integers. In this way, each bar will be centered at the integer -- a better visualization of the fact that each bar represents integer values.)

Use the below applet to draw the histogram you obtained.

Feedback from applet
Non-negative bar heights:
Sum of bar heights:

If we let $N$ be the number of heads observed in each experiment, the $x$-axis of the above histogram represents $N$. The height of each bar represents the number of times you observed the given number of heads, i.e., the frequency that $N$ was the given value. To better compare with the probability distributions of the previous question, we can divide the bar height by 10 (the number of experiments) so that the bar heights represent relative frequency. You can produce such a histogram in R by setting the probability flag to true,
hist(results, breaks=-0.5:4.5, probability=TRUE)
though it isn't too hard to divide by 10 in your head. (With this command, R labels the $y$-axes as “Density,” but for our case, “Relative frequency” is a better description of the bar heights.)

Redraw the histogram, this time with relative frequencies.

Feedback from applet
Non-negative bar heights:
Represents probability distribution:
Sum of bar heights:

(The bar heights should be exactly 1/10th of the heights from the previous histogram, though we don't check for that here.)

Is your relative frequency histogram close to the probability distribution you calculated for the four coin-flip experiment in the previous problem?

Rerun the set of experiments by typing the two commands multiple times.
results=_＿ hist(results, breaks=-0.5:4.5, probability=TRUE)
Do the relative frequency histograms change much?

3. Increase the number of repetitions of the four coin-flip experiment. To run a set of 100 experiments, you can enter the command results=rbinom(100, 4, 0.5). Then, you can use the same histogram commands to plot histograms in terms of frequency or relative frequency.

The R function dbinom gives the probability distribution for the binomial random variable, the one we determined in the previous question. As a comparison, you can plot a bar graph of the probability distribution with these commands.
n=0:4 barplot(dbinom(n,4,0.5), names.arg=n)

The comparison will be easier to see if you can plot the probability distribution directly on top of the histogram from the experiments. Since it is hard to visualize two bar plots on top of each other, we'll change the plot of the probability distribution to a line graph with points. Assuming you have already defined n=0:4, the following set of commands will run a set of 100 experiments, plot a histogram, and then plot the probability distribution on top of it with red points connected by lines.

results=rbinom(100, 4, 0.5)
hist(results, breaks=-0.5:4.5, probability=TRUE)
lines(n,dbinom(n,4,0.5),col='red', lwd=5, type='b')

Run these commands multiple times to see how the results change as you repeat the set of 100 four-coin-flip experiments. Keep in mind that the red points and lines do not change. They may appear to move up or down, but that illusion is simply due to the scale of the $y$-axis changing based on the histogram.

What do you observe about the relationship between the height of the red points (representing the probability distribution) and the height of the bars (representing the relative frequency of each value of $N$ from the experiments)?

Increase the number of experiments in each set to 1000, 10,000, or even 100,000. What happens when you create the relative frequency histograms from a large set of experiments?

This result illustrates how the relative frequency histograms approach the probability distribution as you increase the number of samples (or number of experiments in each set). You could think of the probability distribution as being the limit of the relative frequencies as the number of samples approaches infinity.

3. Let's experiment with coins is to count how many heads you can obtain in a row. Continue flipping a coin until you obtain your first tail and let $N$ be the number of heads before that tail. What is the probability distribution $f_N(n)$ of the random variable $N$?
1. First of all, what valid options for $N$? The smallest value that $N$ could be is
. Is there a largest possible value of $N$?
The probability that the first $13,462,305$ flips result in a head before you see your first tail is
. Therefore, $N$ could be any non-negative integer. It would just be extremely unlikely to be a large number.

2. What is the probability that the first coin flip is a tail, $T$?
Therefore, the probability that $N$ is zero is $f_N(0) = P(N=0)=$
.
3. What is the probability that the first coin flip is a head, $H$?
In terms of the random variable $N$, this probability is
$=＿$.
4. So far, we've introduced two events, which we labeled in terms of the random variable $N$.

• $N=0$: the event that the first coin flip was $T$, so that we obtained zero heads.
• $N \ge 1$: the event that the first coin flip was $H$, so that we know we obtained at least one head.

Since we keep flipping coins only if we obtain an $H$, we continue with more events only conditioned on the event $N \ge 1$. In that case, we consider two more events.

• : the event that the second coin flip was $T$, so we obtained exactly

• : the event that the second coin flip was $H$, so we know we obtained at least

(Online, enter $\ge$ as >= or as the symbol ≥.)

The probabilities of these two events, conditioned on the event $N \ge 1$, are simple since they involve just one more coin flip.

• $P(N=1\,|\,N \ge 1) =$
• $P(N \ge 2 \,|\,N \ge 1) =$

From the definition of conditional probability, recall that $P(A,B) = P(A\,|\,B)P(B)$. Substituting the event $N \ge 1$ for $B$ and either event $N=1$ or $N \ge 2$ for $A$, we can calculate that

• $P(N=1, N \ge 1) = P(N=1\,|\,N \ge 1) P(N \ge 1) =$
$\times$
$=$
• $P(N \ge 2, N \ge 1) = P(N \ge 2\,|\,N \ge 1) P(N \ge 1) =$
$\times$
$=$

But now, the notation is a bit silly. If we've obtained exactly one head ($N=1$), then obviously we've obtained at least one head ($N \ge 1$). Similarly, if we've obtained at least two heads ($N \ge 2$), then obviously we've obtained at least one head ($N \ge 1$). The events $N=1$ and $N \ge 2$ cannot occur without the event $N \ge 1$. Hence, $P(N=1)=P(N=1, N \ge 1)$ and $P(N \ge 2) = P(N \ge 2, N \ge 1)$. We summarize by giving the probabilities of obtaining exactly one head and of obtaining at least two heads.

• $f_N(1) = P(N=1) =$
• $P(N \ge 2) =$
5. We can repeat this procedure to include one more coin flip and calculate the probability that $N=2$ (if third coin flip is $T$) and that $N \ge 3$ (if third coin flip is $H$).

Calculate the probabilities of these events conditioned on the fact that we already flipped heads twice in a row.

• $P(N=2 \,|\, N \ge 2)=$
• $P(N \ge 3 \,|\, N \ge 2)=$

Multiply by $P(N \ge 2)$ to determine the probabilities of obtaining exactly two heads or obtaining at least three heads.

• $f_N(2) = P(N=2)=$
• $P(N \ge 3)=$
6. Notice to get from $P(N \ge 1)$ to $P(N \ge 2)$ we multiply by $\frac{1}{2}$, the probability of getting one more $H$. Similarly, we multiply by $\frac{1}{2}$ to get from $P(N \ge 2)$ to $P(N \ge 3)$. In general, the event $N \ge n$ corresponds to flipping $H$ $n$ times in a row, so the probability $P(N \ge n)$ is the product of $n$ $\frac{1}{2}$'s. $P(N \ge n)$ is the exponential function
$P(N \ge n) =$
.
(Online enter $a^b$ as a^b.)

To know that we obtained exactly $n$ heads, we need to flip a coin one more time and this time obtain a $T$. We need to multiply by one more $\frac{1}{2}$ (the probability of $T$) to determine the probability distribution function $f_N(n)$.
$f_N(n) = P(N=n) =$

(Online, you'll need to use parentheses after the ^ to put a n+1 in the exponent.)

7. Use the below applet to sketch the probability distribution $f_N(n)$ of the number $N$ of consecutive heads. The distribution continues for all non-negative integers $n$, but we just plot $0 \le n \le 5$.

Feedback from applet
Point heights: