-
We can use R to randomly (actually pseudorandomly) flip coins for us so that we simulate coin flip experiments. You can enter rbinom(1,1,0.5)
to random obtain a 0 or 1 with probability 0.5. If we let 1 correspond to $H$ and 0 correspond to $T$, we can view this as the result of a coin flip. Enter the command rbinom(1,1,0.5)
over and over again to see that it randomly gives a 0 or a 1.
We can use the same function to flip a bunch of coins at once. To do our 10 coin-flip experiment, enter the command rbinom(10,1,0.5)
and see that you will get a vector of 10 numbers, each which are 0 or 1. To count the number of heads, you can just add up all those 1's. In R, type sum(rbinom(10,1,0.5))
to perform 10 coin flips and add up the number of H's. You should get a number between 0 and 10. However, as shown above a 0 or 10, or even a 1 or 9, would be unlikely, so you'll usually get a number between 2 and 8.
Repeat this experiment 7 times by repeatedly running the command. Enter your results, separated by commas.
In fact, the binomial distribution, which is the distribution that rbinom
is sampling from, already includes the process of adding up the number of heads. The second argument is the number of coin flips. Instead of typing the command sum(rbinom(10,1,0.5))
, you can type the simpler command rbinom(1,10,0.5)
to get the same result.
Now we can repeat the experiment of flipping 10 coins and counting the heads multiple times with a single command. To do this experiment 100 times, enter the command rbinom(100,10,0.5)
.
-
To use R to perform the four coin-flip experiment 10 times, what R command should you type?
Assign the result these experiments to the vector results
by entering
results=__
Then, you can display a histogram of the results by typing the command:
hist(results, breaks=-0.5:4.5)
(The argument -0.5:4.5
is shorthand for the vector c(-0.5,0.5,1.5,2.5,3.5,4.5)
. It tells R that the breakpoints between intervals for the histogram are the half-integers. In this way, each bar will be centered at the integer -- a better visualization of the fact that each bar represents integer values.)
Use the below applet to draw the histogram you obtained.
Feedback from applet
Non-negative bar heights:
Sum of bar heights:
If we let $N$ be the number of heads observed in each experiment, the $x$-axis of the above histogram represents $N$. The height of each bar represents the number of times you observed the given number of heads, i.e., the frequency that $N$ was the given value. To better compare with the probability distributions of the previous question, we can divide the bar height by 10 (the number of experiments) so that the bar heights represent relative frequency. You can produce such a histogram in R by setting the probability
flag to true,
hist(results, breaks=-0.5:4.5, probability=TRUE)
though it isn't too hard to divide by 10 in your head. (With this command, R labels the $y$-axes as “Density,” but for our case, “Relative frequency” is a better description of the bar heights.)
Redraw the histogram, this time with relative frequencies.
Feedback from applet
Non-negative bar heights:
Represents probability distribution:
Sum of bar heights:
(The bar heights should be exactly 1/10th of the heights from the previous histogram, though we don't check for that here.)
Is your relative frequency histogram close to the probability distribution you calculated for the four coin-flip experiment in the previous problem?
Rerun the set of experiments by typing the two commands multiple times.
results=__
hist(results, breaks=-0.5:4.5, probability=TRUE)
Do the relative frequency histograms change much?
-
Increase the number of repetitions of the four coin-flip experiment. To run a set of 100 experiments, you can enter the command results=rbinom(100, 4, 0.5)
. Then, you can use the same histogram commands to plot histograms in terms of frequency or relative frequency.
The R function dbinom
gives the probability distribution for the binomial random variable, the one we determined in the previous question. As a comparison, you can plot a bar graph of the probability distribution with these commands.
n=0:4
barplot(dbinom(n,4,0.5), names.arg=n)
The comparison will be easier to see if you can plot the probability distribution directly on top of the histogram from the experiments. Since it is hard to visualize two bar plots on top of each other, we'll change the plot of the probability distribution to a line graph with points. Assuming you have already defined n=0:4
, the following set of commands will run a set of 100 experiments, plot a histogram, and then plot the probability distribution on top of it with red points connected by lines.
results=rbinom(100, 4, 0.5)
hist(results, breaks=-0.5:4.5, probability=TRUE)
lines(n,dbinom(n,4,0.5),col='red', lwd=5, type='b')
Run these commands multiple times to see how the results change as you repeat the set of 100 four-coin-flip experiments. Keep in mind that the red points and lines do not change. They may appear to move up or down, but that illusion is simply due to the scale of the $y$-axis changing based on the histogram.
What do you observe about the relationship between the height of the red points (representing the probability distribution) and the height of the bars (representing the relative frequency of each value of $N$ from the experiments)?
Increase the number of experiments in each set to 1000, 10,000, or even 100,000. What happens when you create the relative frequency histograms from a large set of experiments?
This result illustrates how the relative frequency histograms approach the probability distribution as you increase the number of samples (or number of experiments in each set). You could think of the probability distribution as being the limit of the relative frequencies as the number of samples approaches infinity.