Math Insight

An introduction to probability

Math 2241, Spring 2016
Name:
ID #:
Due date: March 30, 2016, 11:59 p.m.
Table/group #:
Group members:
Total points: 3
  1. Let's explore a basic genetics question in the terminology of probability theory.
    1. Imagine that we are cross-pollinating two plants, and we are interested in a particular gene that has two variants, or alleles, in these plants, which we'll denote by A and a. Suppose both plants have the genotype Aa, meaning they each have one copy of allele A and one copy of allele a. In this context, an "experiment" is a pollination resulting in a single offspring having one allele from each parent.

      In probability terminology, the sample space is the set of all possible outcomes of an experiment. In this case, the outcome will be the genotype of the offspring. List the genotypes that make up the sample space:
      . (Separate the genotypes by commas. If a plant has the same allele twice, then we repeat that letter, writing the genotype with two A alleles as AA.)

      An event is an outcome or a set of outcomes, and central task is assigning probabilities to different events. For instance, we may wish to know the probability of the event that an offspring also has genotype Aa. Since this event includes just a single outcome, we can refer to it as a simple event. An event can include more than one outcome. The event in which the offspring has at least one copy of the A allele includes two individual outcomes: genotype AA and genotype Aa.

      To form a probability model,we assign a probability to each event that indicates how likely each event is to occur. There are some intuitive rules that a probability model must obey. For instance, consider the event in which the outcome is in the sample space. Is it possible for this event not to happen? (Recall the definition of sample space.)

      In order to write some of these rules, we need some terminology from set theory.

    2. Let's consider several different events in our cross-pollination example. Say event A is the offspring having at least one copy of allele A, event B is the offspring having at least one copy of allele a, event C is the offspring having genotype aa, and event D is the offspring having genotype AA. What are the specific genotypes in each of these events?

      event A:

      event B:

      event C:

      event D:

    3. Observe that A and B have a common genotype,
      . This reflects the set theoretic operation of intersection: the intersection of A and B is everything that is in both A and B. In other words, the intersection of A and B is the set of outcomes where both A and B occurred. Symbolically, this is written as A $\cap$ B and read "A intersect B".

      Are there any outcomes in A $\cap$ C?
      In this case, the sets A and C are called disjoint, and the intersection is called the empty set, or the null set, which is written with the symbol $\emptyset$. Two events are mutually exclusive if their intersection is the null set, because the occurrence of one excludes the possibility of the other.

    4. Suppose we want to describe the event of having two copies of the same allele. There are two ways we can think of this. We can think of it is the combination of events C and D, where any outcome in C or D works, or we can think of it as the opposite of having one of each allele. Let's first look at the combination of events C and D. The union of C and D is the set of everything that is in C or in D. In the language of probability, the union of C and D is the event containing all outcomes in C or in D. Symbolically, we write the union of C and D as C $\cup$ D and read it "C union D". What are the genotypes in C $\cup$ D?

      What is A $\cup$ B?

    5. The idea of taking the "opposite" of an event is called the complement. The complement of a set is everything outside of the set. In the case of probability, the complement of an event is the event containing every other outcome in the sample space. We write the complement of A as A$^c$, which is read "A complement". What is A$^c$?
      .

      What is B$^c$?
      What is C$^c$?
      .

    6. Let's look briefly at how these operations interact. In the following, write everything in terms of the events X, Y, the null set, and the sample space. These identities apply to all events, and you may find it helpful to think it through with the specific events given above. Write null for the null set and S for the sample space.

      (X$^c$)$^c =$

      X $\cap$ X$^c =$

      X $\cup$ X$^c =$

      X $\cup$ (X $\cap$ Y) =

      Y $\cap$ (X $\cup$ Y) =

      The complement of an intersection can be written as a union, and the complement of a union can be written as an intersection.
      (X $\cap$ Y)$^c =$
      $\cup$

      (X $\cup$ Y)$^c =$
      $\cap$

    7. A probability model assign the probability $P(E)$ to each event $E$. Let's look at the requirements for a probability model to make sense.

      First, as we considered before, the event S, where the outcome is in the sample space, has to happen. Therefore

      $1$. $P(S)=1$, where S is the sample space.

      Second, probabilities have to be between $0$ and $1$, regardless of the event.

      $2$. $0\leq P(A) \leq 1$ for any event A.

      Third, if two events are mutually exclusive, the probability of their union is the sum of their probabilities.

      $3$. If $A \cap B = \emptyset$, then $P(A \cup B) = P(A) + P(B)$.

      From the first and third conditions, we can come up with a useful relationship between the probability of A and the probability of A$^c$. Replace $B$ in the third condition to find $P(A^c)$ in terms of $P(A)$.

      $P(A^c) =$

    8. If there are only a finite number of possible outcomes, assigning probabilities to events is relatively straightforward. We first assign a probability to each simple event, with only the requirement that the sum of all these probabilities must be $1$ so that the sample space has probability $1$. Once we know the probabilities of the simple events, the probability of any other event can be written as the
      of the probabilities of the simple events it contains.

      Let's use this to come up with probabilities for the events in our cross-pollination model. Supposing that there is an equal chance that each parent passes on either allele to the offspring. There are four equally possible situations: both parents pass on the A allele, the ovule has the A allele and the pollen the a allele, the ovule has the a allele and the pollen has the A allele, and both parents pass on the a allele. Two of these are indistinguishable in the offspring, resulting in only three possible genotypes. What are the probabilities of each of these genotypes?

      $P(AA) = $

      $P(Aa) = $

      $P(aa) = $


      What are the probabilities of the events A, B, C, and D from part b?

      $P(A) =$

      $P(B) =$

      $P(C) =$

      $P(D) =$

  2. We can model the location of molecules using a probabilistic model. Suppose we have $3$ molecules of a toxin in a cell, and each minute, any molecule in the cell will leave with probability $0.05$. Once a molecule has left the cell, it does not reenter. We wish to know how many molecules are likely to be in the cell after $5$ minutes. One way to answer this is to determine the probability of each number of molecules being in the cell.
    1. First, we need to determine the sample space. We want to know how many molecules are in the cell after $5$ minutes. An outcome of the “experiment“ would be the location (in or out) of each of the $3$ molecules after $5$ minutes. For example, if the second molecule was inside the cell and the rest were outside the cell, we would have the outcome: out, in, out. The sample space is the set of possible locations of these molecules after $5$ minutes.

      Although the individual outcomes specify the location of each molecule, we aren't interested in knowing which molecule is where. We are only interested in the number of molecules that are still in the cell. Let's define events that reflect the question of interest. Let $n$ be the number of molecules in the cell after $5$ minutes. The events we are interested in are $n=0$, $n=1$, $n=2$, and $n=3$. Are these simple events, or do they contain more than one outcome?

      If we know $n=0$, can we tell where each individual molecule is?
      . What about if $n=1$?
      With $n=0$, every molecule is outside the cell, but with $n=1$, all we know is that one molecule is in the cell, not which one.

      In order to determine these probabilities accurately, we need to determine the probabilities of all the simple events.

    2. What are the simple events? For each molecule, which we'll label $M_1$, $M_2$, and $M_3$, we need to know if it's in the cell or outside of the cell. There are eight possible simple events. We'll identify them by the location of $M_1$, $M_2$, and $M_3$ as either "in" or "out", in order. Write them all down below (three are provided for you)

      $E_1$: in, in, in
      $E_2$: in, in, out
      $E_3$: in, out, in
      $E_4$:

      $E_5$:

      $E_6$:

      $E_7$:

      $E_8$:

    3. Now, we need to determine the probability of each of the simple events. What is the probability that any single molecule is in the cell after $5$ minutes? Since we are given that the probability of a molecule in the cell leaving in one minute is $0.05$, let's analyze the process in terms of individual minutes.

      What is the probability the molecule stays in the cell during the first minute?
      .

      Suppose the molecule stays in the cell the first minute. What is the probability it stays in the cell another minute?
      . The probability that a molecule stays in the cell for two minutes is the product of these two numbers, which is
      .

      To determine the probability that the molecule is in the cell after five minutes, we have to multiply the probabilities that it stayed in the cell during each of those five minutes. What is the probability that a given molecule is in the cell after five minutes? You can round your answer to four decimal places.
      .

      What is the probability that a given molecule is not in the cell after five minutes?

      These are the probabilities that each individual molecule is in or out of the cell, but we need to know the probability of each combination. We assume that the molecules don't affect each other, so the probability of each combination is the product of the probability of each individual molecule being in the specified location. (This is due to an idea called independence, which we will discuss in detail later.) Write down the probability of each simple event (you can round to four decimal places). Since the probability of each molecule is the same, all of the simple events with only one molecule in have the same probability.

      Probability of a simple event with all molecules in the cell:

      Probability of a simple event with two molecules in the cell and one out:

      Probability of a simple event with one molecule in the cell and two out:

      Probability of a simple event with no molecules in the cell:

    4. The last step is to find the probabilities for the events $n=0$, $n=1$, $n=2$, and $n=3$. For these, we just need to
      together the probabilities of the simple events that make up these events. What are the probabilities? (Note: you may round your answer to $4$ decimal places, but keep more decimal places when calculating them or you may get accumulated round-off error.)

      $P(n=0) =$

      $P(n=1) =$

      $P(n=2) =$

      $P(n=3) =$