An introduction to probability

Math 2241, Spring 2023
Name:
ID #:

Due date: March 29, 2023, 11:59 p.m.
Table/group #:
Group members:

Total points: 1

Let's explore a basic genetics question in the terminology of probability theory.
1. Imagine that we are cross-pollinating two plants, and we are interested in a particular gene that has two variants, or alleles, in these plants. Let's denote the two gene variants by A and a. As is true for most genes, each individual plant has two copies of this gene. If both copies of the gene in a particular plant are allele A, we say the genotype of the plant is AA. If, on the other hand, the plant has two copies of allele a, we say its genotype is aa. The last possibility is that a plant is genotype Aa, which means the plant has one copy of allele A and one copy of allele a.
  
  Suppose we take two plants of genotype Aa. We'll pollinate one of these plants with the other, creating offspring that have one allele from each parent plant. In this context, an “experiment” is a pollination resulting in a single offspring having one allele from each parent.
  
  In probability terminology, the sample space is the set of all possible outcomes of an experiment. In this case, the outcome will be the genotype of the offspring. List the genotypes that make up the sample space for our experiment:
  . (Separate the genotypes by commas.)
  
  An event is an outcome or a set of outcomes. For example, the event that the offspring also genotype Aa has a single outcome (and hence can be called a simple event). On the other hand, the event that the offspring has at least one copy of the the A allele includes two possible outcomes: genotype AA and genotype Aa.
  
  To form a probability model,we assign a probability to each event that indicates how likely each event is to occur. There are some intuitive rules that a probability model must obey. For instance, consider the event that the outcome is one of the outcomes in the sample space. Is it possible for this event not to happen? (Recall the definition of sample space.)
  
  In order to write some of these rules, we need some terminology from set theory.
2. Let's consider several different events in our cross-pollination example. Say event A is the offspring having at least one copy of allele A, event B is the offspring having at least one copy of allele a, event C is the offspring having genotype aa, and event D is the offspring having genotype AA. What are the specific genotypes (or outcomes) in each of these events?
  
  event A:
  
  event B:
  
  event C:
  
  event D:
3. Observe that A and B have a common genotype,
  . This reflects the set theoretic operation of intersection: the intersection of A and B is everything that is in both A and B. In other words, the intersection of A and B is the set of outcomes where both events A and B occurred. Symbolically, this is written as A $\cap$ B and read "A intersect B".
  
  Are there any outcomes in A $\cap$ C?
  In this case, the sets A and C are called disjoint, and the intersection is called the empty set, or the null set, which is written with the symbol $\emptyset$. In probability language, we say two events are mutually exclusive if their intersection is the null set, because the occurrence of one event excludes the possibility of the other event.
4. Suppose we want to describe the event of having two copies of the same allele. There are two ways we can think of this. We can think of it is the combination of events C and D, where any outcome in C or D works, or we can think of it as the opposite of having one of each allele. Let's first look at the combination of events C and D. The union of C and D is the set of everything that is in C or in D. In the language of probability, the union of C and D is the event containing all outcomes in C or in D. Symbolically, we write the union of C and D as C $\cup$ D and read it "C union D". What are the genotypes in C $\cup$ D?
  
  What is A $\cup$ B?
5. The idea of taking the "opposite" of an event is called the complement. The complement of a set is everything outside of the set. In the case of probability, the complement of an event is the event containing every other outcome in the sample space. We write the complement of A as A$^c$, which is read "A complement". What is A$^c$?
  
  What is B$^c$?
  What is C$^c$?
6. Let's look briefly at how these operations interact. In the following, write everything in terms of the events X, Y, the null set, and the sample space. These identities apply to all events, and you may find it helpful to think it through with the specific events given above. Write null for the null set and S for the sample space.
  
  (X$^c$)$^c =$
  
  X $\cap$ X$^c =$
  
  X $\cup$ X$^c =$
7. A probability model assign the probability $P(E)$ to each event $E$. Let's look at the requirements for a probability model to make sense.
  First, as we considered before, the event S, where the outcome is in the sample space, has to happen. Therefore
  
  $1$. $P(S)=1$, where S is the sample space.
  
  Second, probabilities have to be between $0$ and $1$, regardless of the event.
  
  $2$. $0\leq P(A) \leq 1$ for any event A.
  
  Third, if two events are mutually exclusive, the probability of their union is the sum of their probabilities.
  
  $3$. If $A \cap B = \emptyset$, then $P(A \cup B) =$
  $+$
  .
  
  From the first and third conditions, we can come up with a useful relationship between the probability of A and the probability of A$^c$. Replace $B$ in the third condition to find $P(A^c)$ in terms of $P(A)$. (Remember, from previous part, that $A + A^c = S$.)
  
  $P(A^c) =$
8. If there are only a finite number of possible outcomes, assigning probabilities to events is relatively straightforward. We first assign a probability to each simple event, with only the requirement that the sum of all these probabilities must be $1$ so that the sample space has probability $1$. Once we know the probabilities of the simple events, the probability of any other (non-simple) event can be written as the
  of the probabilities of the simple events it contains.
  
  Let's use this to come up with probabilities for the events in our cross-pollination model. Supposing that there is an equal chance that each parent passes on either allele to the offspring. There are four equally possible situations: both parents pass on the A allele, the ovule has the A allele and the pollen the a allele, the ovule has the a allele and the pollen has the A allele, and both parents pass on the a allele. Two of these are indistinguishable in the offspring, resulting in only three possible genotypes. What are the probabilities of each of these genotypes?
  
  $P(AA) = $
  
  $P(Aa) = $
  
  $P(aa) = $
  
  Notice how the probability of these three simple events sums up to one and that these three simple events make up the entire sample space.
  
  We can just add up the probability of these simple events to determine the probabilities of the events A, B, C, and D from part b.
  
  $P(A) =$
  
  $P(B) =$
  
  $P(C) =$
  
  $P(D) =$
We can model the location of molecules using a probabilistic model. Suppose we have $3$ molecules of a toxin in a cell, and each minute, any molecule in the cell will leave with probability $0.1$. Once a molecule has left the cell, it does not reenter. We wish to know how many molecules are likely to be in the cell after $5$ minutes.
To answer this question, we will determine the probabilities of there being 0, 1, 2, or 3 molecules in the cell after 5 minutes.
1. First, we need to determine the sample space. Although want to know how many molecules are in the cell after $5$ minutes, an outcome of the “experiment” is more specific than the number of molecules left in the cell. Instead, each outcome (or simple event) is the location (in or out) of each of the $3$ molecules after $5$ minutes. For example, if the second molecule was inside the cell and the rest were outside the cell, we would have the outcome: out, in, out.
  
  The sample space is the set of possible locations of each of these molecules after $5$ minutes. Since there are three molecules, each of which could be in two different positions, there are a total of $2^3=$
  possible outcomes in the sample space.
  
  Write down all possible outcomes as a list of the locations of the three molecules after 5 minutes:
  location of molecule 1, location of molecule 2, location of molecule 3.
  (Three are provided for you.)
  
  $E_1$: in, in, in
  $E_2$: in, in, out
  $E_3$: in, out, in
  $E_4$:
  
  $E_5$:
  
  $E_6$:
  
  $E_7$:
  
  $E_8$:
2. Although the individual outcomes specify the location of each molecule, we aren't interested in knowing which molecule is where. We are only interested in the number of molecules that are still in the cell. Let's define events that reflect the question of interest. Let $n$ be the number of molecules in the cell after $5$ minutes. The events we are interested in are $n=0$, $n=1$, $n=2$, and $n=3$. Are these simple events, or do they contain more than one outcome?
  
  The event $n=0$ corresponds to which of the above outcomes ($E_1$ through $E_8$):
  
  (If the event corresponds to more than one outcome, separate by commas.)
  If we know $n=0$, can we tell where each individual molecule is?
  . (When $n=0$, every molecule is outside the cell. Since we know where every molecule is, $n=0$ is a simple event.)
  
  The event $n=1$ corresponds to which of the above outcomes ($E_1$ through $E_8$):
  
  (If the event corresponds to more than one outcome, separate by commas.)
  If we know $n=1$, can we tell where each individual molecule is?
  . (When $n=1$, all we know is the one molecule is in the cell, but we don't know which one.)
  
  The event $n=2$ corresponds to which of the above outcomes ($E_1$ through $E_8$):
  
  (If the event corresponds to more than one outcome, separate by commas.)
  If we know $n=2$, can we tell where each individual molecule is?
  .
  
  The event $n=3$ corresponds to which of the above outcomes ($E_1$ through $E_8$):
  
  (If the event corresponds to more than one outcome, separate by commas.)
  If we know $n=3$, can we tell where each individual molecule is?
  .
3. We need to determine the probability of each of the simple events. But, let's start with a simpler problem: determining the probability that any single molecule is in the cell after $5$ minutes. Let's analyze the process in terms of individual minutes.
  We are given that the probability that the molecule leaves the cell in one minute is $0.1$. What is the probability the molecule stays in the cell during the first minute?
  .
  
  Suppose the molecule stays in the cell the first minute. In this case, we are given that the second minute is just like the first minute. What is the probability it stays in the cell another minute?
  . The probability that a molecule stays in the cell for two minutes is the product of these two numbers, which is
  .
  
  To determine the probability that the molecule is in the cell after five minutes, we have to multiply the probabilities that it stayed in the cell during each of those five minutes. What is the probability that a given molecule is in the cell after five minutes? You can round your answer to four decimal places.
  .
  
  It might seem a little more challenging to figure out the probability that the molecule leaves cell in the first five minutes. (We'd have to add the probabilities of the different outcomes of it leaving each minute.) However, since we know that either the molecule left the cell or stayed in the cell during those five minutes, we can calculate the probability that the molecule left directly from the previous calculation. What is the probability that a given molecule is not in the cell after five minutes?
4. All the simple events ($E_1$ through $E_8$) involve the fate of all three molecules. In the previous part, we calculated the probability that each individual molecule is in the cell and the probability that it is out of the cell. We need to combine these results together to get the probability of each simple event.
  
  To combine the probabilities, we assume that the molecules don't affect each other. In this case, the probability of each combination is the product of the probability of each individual molecule being in the specified location. (This is due to an idea called independence, which we will discuss in detail later.) Calculate the probability of each simple event by taking the product of the probability that each molecule is in the corresponding location (you can round to four decimal places).
  
  $P(E_1) = $
  
  $P(E_2) = $
  
  $P(E_3) = $
  
  $P(E_4) = $
  
  $P(E_5) = $
  
  $P(E_6) = $
  
  $P(E_7) = $
  
  $P(E_8) = $
  
  Notice how, since the probability of each molecule is the same, all of the simple events with only one molecule in the cell have the same probability. The same is true for all events with two molecules in the cell. There are really only four different probabilities for the simple events.
  
  A more compact summary of the probabilities of the different simple events is the following. Probability of a simple event with all molecules in the cell:
  
  Probability of a simple event with two molecules in the cell and one out:
  
  Probability of a simple event with one molecule in the cell and two out:
  
  Probability of a simple event with no molecules in the cell:
5. The last step is to find the probabilities for the events $n=0$, $n=1$, $n=2$, and $n=3$. For these, we just need to
  together the probabilities of the simple events that make up these events. What are the probabilities? (Note: you may round your answer to $4$ decimal places, but keep more decimal places when calculating them or you may get accumulated round-off error.)
  
  $P(n=0) =$
  
  $P(n=1) =$
  
  $P(n=2) =$
  
  $P(n=3) =$
  
  The most probable outcome is that there are
  toxin molecules in the cell after 5 minutes.

Thread navigation

Math 2241, Spring 2023

Previous: An introduction to probability
Next: Problem set: Conditional probability