Math Insight

Introduction to exponential waiting times

Math 2241, Spring 2023
Name:
ID #:
Due date: April 12, 2023, 11:59 p.m.
Table/group #:
Group members:
Total points: 1
  1. Imagine that a cell is being exposed to constant radiation that may damage its DNA, leading to a mutation that make the cell cancerous. Let's assume that the probability of a cancerous mutation is 0.005 per day.
    1. With this mutation rate, what is the average amount of time it will take for a mutation to occur?
      days

      If we let $T$ be the time until the mutation, we can call $T$ the waiting time. The expected (mean) waiting time, is therefore $_$ days.

    2. To begin, let's think about modeling this scenario in discrete time, where we let the time unit be a day. We can perform an experiment of this scenario and see what outcome we get.

      Starting with day 1, we flip a coin to determine if there was a mutation. This coin will come out Heads with probability 0.005. If we get a Head, we record what day we get the mutation. If we get a Tail, we continue to the next day. We continuing marching through days until we finally get a Head. The outcome of our experiment is the time $T$, which is the day where we finally got a Head (the cell got its cancerous mutation), and we stopped the experiment.

      We repeat this experiment over and over again, recording each outcome. If we repeated long enough and then took an average of all the values of $T$, this average waiting time should be close to what number?
      .

      If, instead of taking the average, we instead recorded how many times the mutation occurred on day 1, how many times it occurred on day 2, how many times it occurred on day 3, etc., on which day do you think you would have recorded the largest number of mutations? The largest number of mutations would occur on day $T=$
      .

    3. We could determine the answer for the previous question if we knew the probability distribution of the random variable $T$. We know some properties of the probability distribution, since we know the mean value of $T$, we know $T$ can take on positive integer values, and we know that it would be unlikely for $T$ to take on extremely large values, such as $10^{10}$.

      The probability distribution (or probability mass function) for $T$ is a function $f_T(t)$ of day $t$ that gives the probability that the waiting time will be $t$ days: $$P(T=t) = f_T(t).$$

      Before calculating a formula for the probability distribution, let's explore what it looks like by doing many such experiments with flipping coins and finding the first Head. Since it may be difficult to find a coin that has a probability of 0.005 of coming up Heads, we can use a computer program to approximate such a random toss. In R, we can use the function rbinom to flip the coin. A call to rbinom(1,1,0.005) will return a 1 with probability 0.005 and return a 0 with probability 1-0.005. All we need to do is create an infinite loop (by using while(TRUE)) and keep flipping coins by calling rbinom inside the loop. If we happen to get a 1 from rbinom, we got our Heads and the cell mutated. We record the number of flips (which stands for the number of days), which is our outcome $T$.

      The following R code will perform such an experiment:

      lambda=0.005
      day = 0
      while(TRUE) { # repeat forever until break
        day=day+1
        
        # flip a coin that will be 1 with probability lambda
        if (rbinom(1,1,lambda)==1) {
          
          # a mutation occured
          # break out of loop
          break
        }
      }
      

      When the loop exits, the variable day will hold the value of the outcome $T$.

    4. Since we want to run a bunch of experiments, we might as well create another loop to run the above code over and over again. If we let the variable n_sample control how many experiments we will run, then we can create a vector of all zeros with rep.int(0,n_samples) and store the value of day in that vector at the end of the experiment.

      If you add this code around the previous code, you'll get the results of 10 experiments in the vector days.

      n_samples=10
      lambda=0.005
      
      # initialize days to be a vector of all zeros
      days=rep.int(0,n_samples)
      
      for(i in 1:n_samples) {
        day = 0
        while(TRUE) {
           [flip coins until get a HEAD and record number of flips in the variable day]
        }
        # record day of mutation
        days[i] = day
      }
      

      After running the code, you can check the mean waiting time with mean(days). This number should fluctuate around the mean value you calculated if you run the code multiple times.

    5. Lastly, you can use the hist command to plot a histogram of the outcomes of the experiments. We'll pass the argument breaks=0:max(days) so that each bin of the histogram will be only one day wide (making a ragged looking histogram with thin bars because we don't want it to average over larger time bins). We'll give it the probability=TRUE argument sot that it will rescale the height of each bar so that the sum of all the heights is 1. In this case, the height of each bar will be the number of mutations that occurred in that day, divided by the total number of experiment, i.e., an estimate of the probability that $T$ was equal to that day.

      We'll just show the first 1000 days of the histogram with the argument xlim=c(0,1000) so that the occasional really long experiment won't make the histogram scale even smaller and make the small day part of it harder to read. You can remove that argument to see the whole histogram. The full histogram command is the following.

      days_show=1000
      hist(days,breaks=0:max(days),xlim=c(0,days_show), 
           probability=TRUE, xlab="days", ylab="Probability per day",
           main="Time to mutation")
      

      When you run 10 experiments, what does the histogram look like?
      If you repeat the set of experiments by running the code again and plotting a new histogram, what do you notice about the different versions of the histogram?
      Do the results reveal which day is mostly likely for the mutation?

    6. To get a better idea of what the probability distribution $f_T(t)$ looks like, we need to take more samples. You can simply increase the variable n_samples to increase the number of experiments to run before computing the histogram.

      Increase n_samples to 100 and run the code to view the histogram of outcomes from 100 experiments. Increase n_samples even further to 1000 and then 10000. The histogram should start to fill in and reveal which days are most likely for the mutation. Which days have more mutations?
      .

      Even with 10,000 samples, the histogram will be too ragged to see precisely which day is most likely. (You'd more than a million samples, but that would take too long to simulate given the way in which we are sampling.) But, can you infer from the trend which day should be mostly likely for the mutation? It appears that the largest number of mutations would occur on day $T=$
      .

    7. Let's now analyze the mutation process to determine the actual probability distribution $f_T(t)$. The first number is easy. What is the probability that there is a mutation on day 1? $P(T=1) = f_T(1) =$
      .

      What's the probability we didn't have a mutation on day 1 (i.e., the probability that $T>1$)? $P(T > 1) =$
      If we didn't have a mutation on day 1, we move onto day 2. Given that that there was no mutation on day 1 (i.e., conditioned on $T>1$), what is the probability that there is a mutation on day 2 (i.e., the probability that $T=2$)? The second coin flip has the same probability of Heads as the first, so the probability of $T=2$ conditioned on the fact that $T>1$ is $P(T=2 \,|\, T>1) = $

      We, however, are interested in the overall probability that there was a mutation on day 2, the probability $P(T=2)$. $P(T=2)$ is the probability that (a) there was no mutation on day 1 and (b) there was subsequently a mutation on day 2. This probability is the product of $P(T > 1)$ times $P(T=2 \,|\, T>1)$. The probability that the mutation occurred on day 2 is $P(T=2) = f_T(2)=$
      .

      What is more likely: that the mutation occurred on day 1 or that the mutation occurred on day 2?

    8. What is the probability that the mutation did not occur on day 1 or on day 2 (i.e., that you flipped Tails twice in a row so that $T > 2$)? $P(T>2) = $
      (Include at least 5 significant digits in all your responses.) Given that $T > 2$, what's the probability that $T=3$? $P(T=3 \,|, T>2) =$
      Therefore, what is the probability that the mutation occurred on day 3?
      $P(T=3) = P(T=3 \,|, T>2)P(T > 2)=$

      What is more likely: that the mutation occurred on day 2 or that the mutation occurred on day 3?

    9. In the above answers, you were using the fact that the probability of Heads is $\lambda = 0.005$. As we continue to develop the formula for the probability mass function (the probability distribution) $P(T=t) = f_T(t)$, the pattern will be easier to see, and the formula will be more useful later, if we use the symbol $\lambda$ rather than substituting its numerical value. Rewrite the above formulas using the symbol $\lambda$.

      $P(T=1) =$

      $P(T>1) =$

      $P(T=2) =$

      $P(T>2) =$

      $P(T=3) =$

    10. Can you continue the pattern for any number of days? What's the probability that you flip a Tails $t-1$ times in a row, i.e., what's the probability that $T > t-1$?
      $P(T > t-1 )=$

      Given that $T > t-1$, what's the probability you get a Heads on the next flip so that $T=t$?
      $P(T=t \,|\, T > t-1) = $

      Putting it all together, what is the probability of the mutation occurring on day $t$?
      $P(T=t) = P(T=t \,|\, T > t-1)P(T > t-1) =$

      This last expression is the probability mass function $f_T(t)$. You've determine that $$P(T=t) = f_T(t) = _.$$

      Each day the probability of the mutation gets
      because you multiply by one more factor of
      which is
      than 1.

      What's the probability the mutation occurs on day 200? $P(T=200) = f_T(200) =$
      (Here we want a numerical answer so substitute $\lambda=0.005$; be sure to include at least five significant digits.)

    11. You can add a plot of $f_T(t)$ to your histogram by defining the function
      f_T = function(t) {
        lambda*(1-lambda)^(t-1)
      }
      

      and then adding it to the graph with the command

      curve(f_T, from=0, to=days_show, add=TRUE, col= "red", lwd=3)
      

      When you compare the graph of $f_T(t)$ to the histogram with n_samples=10000, the two should be close, though the histogram should include a lot of “noise” with bar heights rising above and dropping below the curve of $f_T(t)$.

    12. Given the definition of f_T, you can find the probability of the mutation occurring on any day. For example, you can enter f_T(200) to verify your above answer for $f_T(200)$. You can also calculate the probability of a bunch of days by giving a vector of days for the argument of the function. For example, you can enter f_T(1:20), and the output will be a vector of the probability of mutation for the first 20 days.

      In R, the function sum will add up all the components of a vector. If you enter sum(f_T(1:20)), the output will be the probability that $T \le 20$. What is that probability?
      $P(T \le 20) = \sum\limits_{t=1}^{20} f_T(t) = $

      What is the probability that $T$ is 200 or less?
      $P(T \le 200) = \sum\limits_{t=1}^{200} f_T(t) = $

      What is the probability that $T$ is any integer from 1 to infinity? This answer is an infinite sum. You won't be able to calculate that directly in R, but if you can calculate enough terms of the sum to get a good idea of what the total should be.
      $\sum\limits_{t=1}^{\infty} f_T(t) = $

      This result should make sense, because $T$ has to be some positive integer.

    13. Now that we have the probability distribution $f_T$, we can also verify the expected value of $T$ (or mean value of $T$). To calculate the expected value, we just need to add up the different values of $T$, multiplied by the probabilities of them occurring, which is $f_T(T)$. That way, values of $T$ that are more likely to occur are multiplied by a larger number. We should get the same result as if we did a really long simulation and then averaged all the different values of $T$ that we got.

      The expected value of $T$ is the sum $\sum\limits_{t=1}^\infty t f_T(t)$. Although we won't be able to calculate all the infinite terms in R, we can calculate enough of them so that we'll can tell what the answer is.

      Recall that f_T(1:5) is a vector of the values $f_T(1)$ through $f_T(5)$. To multiply $f_T(1)$ by $1$, $f_T(2)$ by $2$, all the way to multiplying $f_T(5)$ by $5$, you can type: f_T(1:5)*(1:5). R multiplies the first component of the first vector by the first component of the second vector, then multiplies their second components, and continues to their last components, calculating the numbers we want. If you type sum(f_T(1:5)*(1:5)), you'll add up those numbers, getting the sum $\sum_{t=1}^5 t f_T(t)$. What is the value of this sum?

      Continue increasing 5 to larger numbers until the value stops changing. The result will be the expected value of $T$. What is the expected value (denoted $E(T)$)?
      $E(T) = \sum\limits_{t=1}^\infty t f_T(t) = $

      This value should match what you determined right at the beginning.

  2. The choice of day as the unit of time for the mutation was completely arbitrary. We could have used hours or minutes instead. A mutation rate of $0.005$ per day is equivalent to a mutation rate of $0.00020833$ per hour or a mutation rate of $3.4722 \cdot 10^{-6}$ per minute. If we repeat the same procedure after changing our discretization of time, will we get comparable results?
    1. Let's look at the case where we discretize time into bins of hours rather than days so that the equivalent mutation rate is $\lambda_h = 0.00020833$ per hour. We added the subscript $h$ to $\lambda$ so we can distinguish it from the original $\lambda = 0.005$ per day. In the same way, let's denote the new waiting times by $T_h$, where $T_h$ is the number of hours until the mutation occurs.

      If you like, you can try redoing the simulation code to find samples $T_h$. Each sample will take 24 times longer to obtain, though, as it the number of coins you'll have to flip before you get a Heads will go up by a factor of 24. Moreover, the histogram will be much messier and harder to see, as the bin width will be 24 times smaller. You'd have to take 24 times as many samples to get a similar number of samples per bin and hence a similar variation among the bins. If you are patient, you could see what you get.

      You'd probably be better off just computing the probability mass function $f_{T_h}(t)$ of the random variable $T_h$. In fact, it's identical to the probability mass function $f_T(t)$ for the waiting time $T$ in days, with the only difference being you need to add the subscript $h$ to the $\lambda$. The probability mass function is
      $f_{T_h}(t) = $
      .

    2. Assuming you still have lambda and days_show defined in your R session, you can plot the function $f_{T_h}(t)$ in R with the commands
      lambda_h = lambda/24
      f_T_h = function(t) {
        lambda_h*(1-lambda_h)^(t-1)
      }
      plot(f_T_h, from=0, to=days_show*24, col="green", lwd=10,
          xlab="hours", ylab="Probability per hour",
          main="Time to mutation")
      

      How does this graph of $F_{T_h}$ compare to your graph of $f_T$? The graph is a
      shape, but the scale in the $x$-axis is much
      and the scale in the $y$-axis is much
      . The scale differences should be a factor of
      , given that's the factor between hours and days.

    3. Is $f_{T_h}(t)$ really just a scaled version of $f_T(t)$? Let's look at the values for one day or 24 hours.
      What is $f_T(1)$?
      This number is the probability that the mutation occurred in the first day.
      What is $f_{T_h}(24)$?
      This number is the probability that the mutation occurred in the 24th hour.

      These number seem to represent quite different things. But, if we multiply $f_{T_h}(24)$ by $24$, then we rescale the number to units of probability per day. That way we can compare it to $f_T(1)$.
      What is $24 f_{T_h}(24)$?

      Even though the formula for $24 f_{T_h}(24)$ is quite a bit different from the formula for $f_T(1)$, these numbers are actually pretty close to each other..

      Let's compare the two functions at $T=200$ days, which corresponds to $T_h=200 \cdot 24 = 4800$ hours.
      $f_T(200) = $

      $f_{T_h}(4800) = $

      The rescaled value is $24f_{T_h}(4800) =$

      Again, when you rescale $f_(T_h)$ to probability per day, it's pretty close to $f_T$.

      Let's $f_T$ and the rescaled version of $f_{T_h}$ on top of each other to see how close they look. Assuming you still have f_T, f_T_h, and days_show defined, this code should do the trick.

      f_T_h_rescaled = function(t) {
        24*f_T_h(24*t)
      }
      plot(f_T_h_rescaled, from=0, to=days_show, col="green", lwd=10,
          xlab="days", ylab="Probability per day",
          main="Time to mutation")
      curve(f_T, from=0, to=days_show, add=TRUE, col= "red", lwd=6)
      

      What did you discover? The graphs of the two functions
      .

      You'll forgive us if we don't make you do this all over again in terms of minutes with a mutation probability of $\lambda_m = 3.4722 \cdot 10^{-6}$ per minute. You're probably willing to believe that the probability mass function for the waiting time $T_m$ in minutes will lie right on top of these curves if we rescaled it to probability per day.

  3. Here's what we will make you do, though: make the time interval even smaller than minutes. In fact, go smaller than seconds or even nanoseconds. Take the limit to infinitesimally small intervals. Why? Because in this limit, everything becomes easier!

    As we go to infinitesimally small time intervals, we are moving to a continuous time model. The move from discrete time to continuous time is similar in concept to deterministic dynamical systems, where we got differential equations in the continuous time limit. We won't phrase the dynamical rules in terms of derivatives in the probabilistic (or stochastic) case (though it is possible to go that route). Instead, we'll still work with waiting times and their probability distributions.

    As before, we will model a cell where radiation leads to a constant probability rate of a cancerous mutation. Let's continue to use days for the units of measure, so we can compare with the previous results without any rescaling. The probability per day of a mutation is $\lambda=0.005$. We'll use the notation $T$ for the waiting time, measured in days. In the continuous time version, though, $T$ will no longer be an integer, as the mutation could occur after a fractional number of days.

    1. In the discrete stochastic model, the dynamic rule was that, if we haven't had a mutation yet, then the probability for the mutation in the next day was $\lambda$. In other words, for any day number $t$, if we knew that the mutation did yet occur after day $t$ was over, (i.e., we know that $T > t$), then the probability that we got a mutation the next day so that $T=t+1$ was $\lambda$: $$P(T=t+1 \,|\, T > t) = \lambda$$ In the continuous time model, we change the interval of 1 day to an interval of length $\Delta t$ days, where $\Delta t$ represent a small number. Given that the mutation probability per day is $\lambda$, the probability of the mutation occurring in that small interval is $\lambda \Delta t$. In equations, the probability that $T$ is in this small interval $(t, t+\Delta t)$, which we write as $t \in (t,t+\Delta t)$ (confused?), is $$P(T \in (t, t+\Delta t) \,|\, T > t) = \lambda \Delta t.$$ If we set $\Delta t$ to 1, then this equation looks quite similar to the discrete equation. However, $\Delta t$ is actually infinitesimally small, as we taking really small steps in time, actually moving continuously in time.

      We don't get the mutation in the interval $(t,t+\Delta t)$ with probability $1-\lambda \Delta t$. Since in this case, we know that $T > t+\Delta t$, we can express this probability of no mutation with the equation $$P(T > t+\Delta t \,|\, T > t) = 1-\lambda \Delta t,$$ which is the continuous analog to the discrete equation $P(T > t + 1 \,|\, T > t) = 1-\lambda$. These two equations specify the dynamics of the stochastic process of the cell mutations.

      Given that $\lambda = 0.005$, rewrite these equations in terms of numbers and $\Delta t$
      The probability of a mutation in a small interval is $P(T \in (t, t+\Delta t) \,|\, T > t) = $

      The probability of no mutation in a small interval is $P(T > t+\Delta t \,|\, T > t) = $

    2. The probability distribution of the waiting times $T$ (actually called a probability density function) specifies the probability that the waiting time is in any interval $(t,t+\Delta)$: $$P(T \in (t,t+\Delta t)) = f_T(t) \Delta t$$ It shouldn't be surprising that the probability of being in the interval should be proportional to the interval width $\Delta t$. If you set $\Delta t =1$, it looks similar to the discrete probability distribution: $$P(T = t) = f_T(t).$$

      It turns out that $f_T(t)$ is an exponential function, just like in the discrete case. Without worrying about where it came from, we'll just assert that it is the function $$f(t) = \lambda e^{-\lambda t},$$ where $e$ is a special number whose value is $e \approx 2.7182818$. We aren't going to worry how $\lambda (1-\lambda)^{t-1}$ got converted to $\lambda e^{-\lambda t}$ as the time interval got small. It's enough to look at the plot of this function and see how it compares to discrete probability distributions you've already plotted. Assuming the most recently plotting command was from the previous question, you can add the continuous version of $f_T$ on top with these commands.

      f_T_cont=function(t) {
        lambda*exp(-lambda*t)
      }
      curve(f_T_cont, from=0, to=days_show, add=TRUE, col= "blue", lwd=2)
      

      Since using a time interval of an hour was already quite small for this process, the probability distribution function for the continuous case should be quite close to the graph of the rescaled $f_{T_h}(t)$.

    3. So far, the continuous time stochastic process describing the mutations doesn't seem to be living up to its promise to be simpler. We better show that it simplifies our lives.

      Simulating this process is quite simple. R comes with a built-in function rexp that will sample an “exponential random variable“ which is what our waiting time $T$ is in the continuous case. All you have to do is call rexp(1,0.005) and you have a sample of $T$. No need to keep flipping coins a bunch of times until you get a Heads.

      The code modified to use rexp is the following. We completely removed the infinite loop where we flipped an indefinite number coins until we got a Heads.

      n_samples=10
      lambda=0.005
      days_show = 1000
      
      # initialize days to be a vector of all zeros
      days=rep.int(0,n_samples)
      
      for(i in 1:n_samples) {
        # day is an exponential random variable with rate lambda
        day = rexp(1, lambda)
        # record day of mutation
        days[i] = day
      }
      
      cat("average day of mutation:", mean(days))
      
      hist(days,breaks=0:ceiling(max(days)),xlim=c(0,days_show), 
           probability=TRUE, xlab="days", ylab="Probability per day",
           main="Time to mutation")
      
      f_T_cont = function(t) {
        lambda*exp(-lambda*t)
      }
      curve(f_T_cont, from=0, to=days_show, add=TRUE, col= "red", lwd=3)
      

      This code runs much faster than the one for the discrete case. Now, you can increase n_samples even as large as a million, and it will run in a reasonable amount of time. Slowly increase n_samples to that size and see how the histogram converges to the graph of the probability distribution $f_T$.