Math Insight

Introduction to birth-death processes

Math 2241, Spring 2022
Name:
ID #:
Due date: April 20, 2022, 11:59 p.m.
Table/group #:
Group members:
Total points: 1
  1. A cell exposed to radiation is accumulating DNA mutations at a rate of $\lambda=0.005$ mutations per day. Let $N(t)$ be the number of mutations after $t$ days. A day 0, there are no mutations: $N(0)=0$.
    1. Write a differential equation for the rate of change $\diff{N}{t}$. First write the equation, using the value $\lambda=0.005$.


      Then write the equation more generally, in terms of the parameter $\lambda$.

    2. What is the solution to the differential equation?
      Using the value $\lambda=0.005$, the solution is $N(t) = $

      In terms of the parameter $\lambda$, the solution is $N(t) =$
    3. According to the solution with $\lambda=0.005$, after $50$ days, the number of mutations present in the cell would be $N(50) = $
      . This value
      make biological sense, as a possible number of mutations.

      If you solved the equation multiple times with the same initial condition $N(0)=0$, you would end up with
      in the cell after 50 days. On the other hand, if you ran the experiment multiples times with the exact same conditions, you would probably obtain
      in the cell after 50 days.

    4. To better capture the biological reality, we can modify our mathematical model so that $N(t)$ can take on only integer values. In this new model, the number of cells $N(t)$ will jump up by one at the times of random events where the mutation occurs. Given the random mutation times, the model will have different solutions if you solve it multiple times even with the same initial conditions and parameters. We refer to such a model as a stochastic model, where the term stochastic means that the model incorporates randomness.

      Let's modify the differential equation model to include randomness. The differential equation $\diff{N}{t} = 0.005$, where $t$ is measured in days, means that every day $N$ increases by
      . Every 2 days, $N$ increases by
      . If you wait 0.1 days, $N$ will have increased by
      . If you wait $\Delta t$ days, where $\Delta t$ is any number, then $N$ will have increased by
      .

      Let $\Delta N$ be the increase in mutation count $N$ over a time period $\Delta t$. To repeat last result, according to the differential equation $\Delta N = $
      . (You could think of this equation being $\frac{\Delta N}{\Delta t} = 0.005$, which is similar in form to the differential equation.) We'll use this equation as a starting point for the stochastic model.

    5. Since we are working with continuous time models, we are interested in how $N$ changes over a small interval of time of width $\Delta t$. In the differential equation, the change in mutation count, $\Delta N$, will be a small fraction when $\Delta t$ is small, as $N(t)$ varies smoothly with time. However, for the stochastic model, we will allow $\Delta N$ to be only an integer, even when $\Delta t$ is a small interval of time.

      How do we allow $\Delta N$ to be an integer when $\Delta t$ is small so that $0.005 \Delta t$ is much less than one? We simply interpret the number $0.005 \Delta t$ as the probability that $N$ increases by $1$, i.e., the probability that $\Delta N=1$. Since we are working with small intervals, we ignore any possibility that $N$ could have increased by two or more. Therefore, if $\Delta N$ isn't $1$, then it must be $0$. The probability that $\Delta N=0$ is one minus the probability that it is one, i.e., $1-0.005\Delta t$.

      We summarize our results:
      $P(\Delta N = 1) = $

      $P(\Delta N = 0) = $

    6. Remember what $\Delta N$ is? It is the
      in $N$ over an interval of width
      . If we start the interval at time $t$, then the interval ends at time
      so that $\Delta N = N(t+\Delta t) - N(t)$. If we assume $N(t)$ is known, then we can rewrite the probabilities for $\Delta N$ as probabilities for $N(t+\Delta t)$. The event that $N(t+\Delta t)=N(t)+1$ is the same thing as the event that $\Delta N=$
      , which occurs with probability
      .

      We could express this conclusion in terms of probabilities of the number $N(t+\Delta t)$ conditioned on different values of $N(t)$. For example, if $N(t)$ happened to be equal to 17, then the probability that $N(t+\Delta t)$ is equal to 18 (i.e., that $\Delta N=1$) is
      $P(N(t+\Delta t)=18 \,|\, N(t)=17) = $
      .
      On the other hand the probability that $N$ stayed at 17 (i.e., that $\Delta N = 0$) is
      $P(N(t+\Delta t)=17 \,|\, N(t)=17) = $
      .

      There was nothing special about the number 17. We'd get the same number for the probability that $N(t+\Delta t)=59$ conditioned on $N(t)=58$. To save writing the equation for all different values of $N(t)$, we could summarize all cases with one set of equations by examining that case $N(t)=n$, where $n$ is any non-negative integer. If $N$ starts at the number $n$ and then increases by $1$ during the interval of width $\Delta t$, then $N(t+\Delta t) =$
      , which occurs with probability
      . On the other hand, if $N$ didn't increase, then $N(t+\Delta t) = $
      , which occurs with probability
      .

      We summarize the general results as
      $P(N(t+\Delta t) = n+1 \,|\, N(t)=n) =$

      $P(N(t+\Delta t) = n \,|\, N(t)=n) =$

      Since we view $\Delta t$ as a small interval, we don't allow $\Delta N$ to be anything other than $0$ or $1$. This means if $m > n+1$ (or any number other than $n$ or $n+1$, for that matter), then
      $P(N(t+\Delta t) = m \,|\, N(t)=n) =$
      .

      If we want to write the result as compactly as possible, we could write the stochastic model as $$P(N(t+\Delta t) = m \,|\, N(t)=n) = \begin{cases} 0.005\Delta t & \text{if $m=n+1$}\\ 1-0.005\Delta t &\text{if $m=n$}\\ 0 & \text{otherwise} \end{cases}$$ or, in terms of the parameter $\lambda$ as $$P(N(t+\Delta t) = m \,|\, N(t)=n) = \begin{cases} \lambda\Delta t & \text{if $m=n+1$}\\ 1-\lambda\Delta t &\text{if $m=n$}\\ 0 & \text{otherwise} \end{cases}$$

    7. We can simulate this stochastic model using R. We need to keep track of the number of mutations with a variable N that we set equal to zero at the beginning of the simulation. To determine when the first mutation occurs, we could take really small time steps of width $\Delta t$. For each time step, we could flip a coin that comes out Heads with probability
      . If we get a Heads, we increment $N$ by $1$ to record the mutation; otherwise, we leave $N$ the same as no mutation occurred. As we march forward in time, the cell will slowly accumulate mutations. Since $\Delta t$ is small so that the probability of Heads, $_$, is tiny, we'll waste a lot of effort flipping many coins before we finally get a Heads and increment the mutation count $N$.

      To simulate the stochastic process more efficiently, we'll take advantage of the fact that, if an event occurs randomly at a rate of $0.005$, then the waiting time until the next event $T$ will be an exponential random variable with mean $1/0.005$. In R, one can generate such a random variable with a call to rexp(1,0.005) -- no need to continue flipping coins many times.

      To start the R code, you should initialize the variable lambda with the command
      lambda =

      and initialize the variables $t$ and $N$ with the commands
      t = 0
      N = 0
      Also, create two vectors to store the values of the $t$ and $N$, initializing them to the starting values with the commands
      t_vector = t
      N_vector = N

      Then, if you set tmax to specify how the simulation length, e.g., with
      tmax=1000
      and run the following loop, it will record the times the jumps and the resulting values of $N$ in the vectors t_vector and N_vector. (It records two values of $N$ at each time point, the values before and after the jump, so that when plotting, the function will look constant in between jumps, as it should.)

        while(t < tmax) {
      
          # generate an exponential random variable at rate lambda
          # to determine time of next mutation
          t = t + rexp(1,rate=lambda)
          
          # record count at time before the jump
          t_vector = append(t_vector, t)
          N_vector = append(N_vector, N)
      
          # increment count by 1
          N=N+1
          
          # record jump in count at time t
          t_vector = append(t_vector, t)
          N_vector = append(N_vector, N)
        }
      

      You can then plot the results with the command

      plot(t_vector,N_vector,'l',ylab="number of mutations", xlab="time (days)", col=1)
      

      If you reinitialize t, N, t_vector, and N_vector, then rerun the loop and plot again, you should get
      results.

      If you like, you can modify your code to define a variable n_samples and run this loop multiple times, plotting them on the same axes to compare different iterations. Or, you can download this generic birth process R script, which provides the structure for simulating a birth process, such as these mutations. In this code, find the line that begins with birth_rate =, and replace [enter expression] with the value of the birth rate. (To make later changes easier, you may want to define a parameter lambda at the beginning of the script and refer to that in your expression for the birth rate.)

    8. Set n_samples to $3$ and tmax to $1000$ to simulate three different samples for $1000$ days. According to the deterministic model, how many mutations should occur in $1000$ days?
      How many mutations did you observe in your three samples?
      (Enter numbers separated by commas.)

      Use the following applet to sketch the results from the simulation. The final values of your plot must match the number of mutations you entered above. The applet also checks that each solution trajectory is valid, in that each jump is upward by exactly one. (Increase $n_p$ to reveal additional points for each solution trajectory. Increase $n_s$ to reveal new solution curves.)

      Feedback from applet
      final values:
      solution 1: initial condition:
      solution 1: valid jumps:
      solution 2: initial condition:
      solution 2: valid jumps:
      solution 3: initial condition:
      solution 3: valid jumps:
    9. Returning to your R script, add a curve to your plots that shows the solution to the deterministic model. Above, you should have determined that the deterministic solution is $N(t)=\lambda t$.

      If you used the birth process template script, you can set plot_deterministic to TRUE and enter the solution to the deterministic equation inside the N_deterministic function. Assuming that you have the line
      lambda=0.005
      at the top of the script, you can simply add the line
      lambda*t
      inside the curly braces { } of the N_deterministic function. This line will define the function N_deterministic(t) to be $\lambda t$.

      When you run sets of 4 to 20 samples, do the stochastic simulation results closely match the deterministic solution?

    10. Increase the mutation rate from $\lambda=0.005$ to $\lambda=0.05$ or even $\lambda=0.5$. As you increase the mutation rate, what happens to the match between the deterministic solution and the stochastic solutions? The stochastic solutions
      the deterministic solution as $\lambda$ increases.
    11. Set the mutation rate back to $\lambda=0.005$. This time increase the maximum simulation time tmax from 1000 days to 10,000 days or even 100,000 days. As you increase the simulation time, what happens to the match between the deterministic solution and the stochastic solutions? The stochastic solutions
      the deterministic solution as simulation time increases.

  2. When populations are reproducing, the growth rate should be proportional to the population size. Let $N(t)$ represent the number of bacteria growing in a colony after $t$ minutes. The colony was begun with $N(0)=2$ bacteria.

    Set the per capita growth rate or, the rate that any one individual gives birth to a new individual, to $\lambda = 0.1$ per minute. If this per capita growth rate is $0.1$ and there are $N$ individuals, what's the overall rate at which the population is growing?

    1. Write a differential equation for the rate of change $\diff{N}{t}$. First write the equation, using the value $\lambda=0.1$.


      Then write the equation more generally, in terms of the parameter $\lambda$.

    2. What is the solution to the differential equation?
      Using the value $\lambda=0.1$, the solution is $N(t) = $

      In terms of the parameter $\lambda$, the solution is $N(t) =$

      The population size is exhibiting exponential
      .

    3. In the stochastic model, $\lambda$ represents the probability per minute that a given bacterium divides in half to give birth to a new cell. Using the value $\lambda=0.1$, the probabilistic model becomes:
      $P(N(t+\Delta t) = n+1 \,|\, N(t) = n) = $

      $P(N(t+\Delta t) = n \,|\, N(t) = n) = $

      $P(N(t+\Delta t) = m \,|\, N(t) = n) = $
      for any $m$ such that $m \ne n$ and $m \ne n+1$.

      The stochastic model can be written more compactly as $$P(N(t+\Delta t) = m \,|\, N(t)=n) = \begin{cases} _ & \text{if $m=n+1$}\\ _ &\text{if $m=n$}\\ _ & \text{otherwise.} \end{cases}$$

      In terms of the parameter $\lambda$, the probabilistic model is:
      $P(N(t+\Delta t) = n+1 \,|\, N(t) = n) = $

      $P(N(t+\Delta t) = n \,|\, N(t) = n) = $

      $P(N(t+\Delta t) = m \,|\, N(t) = n) = $
      for any $m$ such that $m \ne n$ and $m \ne n+1$.

      The stochastic model can be written more compactly as $$P(N(t+\Delta t) = m \,|\, N(t)=n) = \begin{cases} _ & \text{if $m=n+1$}\\ _ &\text{if $m=n$}\\ _ & \text{otherwise.} \end{cases}$$

    4. You can modify the R script for account for the birth rate being proportional to the population size $N$ by multiplying the variable birth_rate by N and changing the deterministic solution to exponential growth. Starting from the constant mutation rate script, you should make the following changes.

      1. Change the line defining $\lambda$ at the top to
        lambda = 0.1
      2. Change the initial condition to $N(0)=2$, changing the line near the top to
        N0=2
      3. Make the maximum time smaller, for example changing the line near the top to
        tmax=10
      4. Multiply the birth rate by $N$, changing the line defining birth_rate inside the while loop to
        birth_rate=lambda*N
      5. Change the deterministic solution to exponential growth by modifying the line inside N_deterministic to be
        N0*exp(lambda*t)

      With these changes, tou can explore the relationship between the deterministic solution and the stochastic solutions. In this case, since the population grows exponentially, if you let tmax get much larger than 50, the simulation will take a long time as the number of bacteria gets large.

      Set n_samples and tmax to around 10. Increase the initial condition $N(0)$ from $2$, to $20$, and even to $200$. As you increase the initial population size, what happens to the match between the deterministic solution and the stochastic solutions? The stochastic solutions
      the deterministic solution as population size increases.

  3. Let $N(t)$ still represent the number of bacteria in a colony after $t$ minutes that began with $N(0)=2$ bacteria. This time, in addition to birth rate of $\lambda = 0.1$ per minute per bacterium, we will add a probability of $\mu = 0.05$ per minute that a bacterium dies.
    1. Write a differential equation for the rate of change $\diff{N}{t}$. First write the equation, using the values $\lambda=0.1$ and $\mu=0.05$.


      Then write the equation more generally, in terms of the parameters $\lambda$ and $\mu$.

    2. What is the solution to the differential equation?
      Using the values $\lambda=0.1$ and $\mu=0.05$, the solution is $N(t) = $

      In terms of the parameters $\lambda$ and $\mu$, the solution is $N(t) =$
    3. For the deterministic model, including a death rate just changed $\lambda$ to $\lambda-\mu$. Rather than adding a death rate of $\mu=0.05$, we would have obtained the same deterministic model if we had simply changed $\lambda$ from $0.1$ to $0.05$ and didn't add any deaths.

      The stochastic model, however, treats deaths differently from simply decreasing births. Before the introduction of deaths, in a small time window of width $\Delta t$, the population size $N$ could only jump up by one, from the number $n$ to the number $n+1$, or stay at the same population size of $n$. With the introduction of deaths, there is now the possibility that the population decreases by one to $n-1$.

      The probability that a bacterium divides in time interval $\Delta t$ and gives birth to a new cell is unchanged from before the introduction of the death rate. Using the value $\lambda=0.1$, this birth probability is
      $P(N(t+\Delta t) = n+1 \,|\, N(t) = n) = $
      .

      The probability that a bacterium dies in the time interval $\Delta t$ is identical in form to the birth probability. In this case $N$ decreases from the number $n$ to the number $n-1$, and the rate depends on $\mu$ rather than $\lambda$. Using the value $\mu=0.05$, this death probability is
      $P(N(t+\Delta t) = n-1 \,|\, N(t) = n) = $
      .

      Since we view the time interval as being small, we allow at most one birth or one death in the interval. It turns out, we also don't have to consider the possibility that we have both one birth and one death in the interval. (Just like the event of two births or two deaths, the probability of the event of one birth and one death is vanishingly small as $\Delta t$ gets small.) Therefore, the probability that $N$ stays at the value $n$ during the whole interval is just one minus the sum of the above two probabilities:
      $P(N(t+\Delta t) = n \,|\, N(t) = n) = $

      Again, we don't allow $N$ to change by more than one during the interval, so the probability of any other change in $N$ is
      $P(N(t+\Delta t) = m \,|\, N(t) = n) = $
      for any $m$ not equal to $n$, $n+1$, or $n-1$.

      The stochastic model can be written more compactly as $$P(N(t+\Delta t) = m \,|\, N(t)=n) = \begin{cases} _ & \text{if $m=n+1$}\\ _ &\text{if $m=n$}\\ _ & \text{if $m=n-1$}\\ _ & \text{otherwise.} \end{cases}$$

      In terms of the parameters $\lambda$ and $\mu$, the probabilistic model is:
      $P(N(t+\Delta t) = n+1 \,|\, N(t) = n) = $

      $P(N(t+\Delta t) = n \,|\, N(t) = n) = $

      $P(N(t+\Delta t) = n-1 \,|\, N(t) = n) = $

      $P(N(t+\Delta t) = m \,|\, N(t) = n) = $
      for any $m$ not equal to $n$, $n+1$, or $n-1$.

      The stochastic model can be written more compactly as $$P(N(t+\Delta t) = m \,|\, N(t)=n) = \begin{cases} _ & \text{if $m=n+1$}\\ _ &\text{if $m=n$}\\ _ & \text{if $m=n-1$}\\ _ & \text{otherwise.} \end{cases}$$

      We refer to such a stochastic model that includes births and deaths as a birth-death model.

    4. Now that we have two possible events to track, births and deaths, simulating the stochastic system is slightly more complicated. The simulation is simplified by first ignoring any difference between births and deaths so that the model appears similar to the birth-only model. If we don't care about distinguishing births from deaths but just want to know when the next event occurs (where an event is either a birth or death), then we just need to know the combined rate of these events. If the population size is $N$, the total rate of births is
      and the total rate of deaths is
      . The combined rate of birth or death events in their sum:
      .

      The first step in the stochastic simulation is to determine when the next event occurs. If we let $T$ be the waiting time until the next event, then $T$ will be an
      random variable with mean that is equal to
      .

      Only after we draw a random waiting time $T$, do we worry about the identity of the event. The likelihood of the event being a birth or death depends on the relative magnitudes of the birth and death rates. If these rates were equal, then the event would be a birth with probability
      and the event would be a death with probability
      . However, since in our model the per capita birth rates is $\lambda=0.1$ per minute and per capita death rates is $\mu=0.05$ per minute, what's the likelihood the event was a birth?
      What's the likelihood that the event was a death?
      (Since one of the two must have occurred, this probabilities must sum to 1. Simply rescale the values of $\lambda$ and $\mu$ so that they sum to 1.) If rounding to a decimal, include at least five significant digits in your response.

      In the simulation, we want to be able to determine these probabilities even as you change $\lambda$ and $\mu$, so we want expressions in terms of the parameters $\lambda$ and $\mu$. To rescale the quantities $\lambda$ and $\mu$ so that they sum to 1, you should divide by
      . Therefore, in terms of the parameters $\mu$ and $\lambda$, what is the probability that the event was a birth?
      What is the probability that the event was a death?
      .

      To summarize, we will simulate each event of the birth-death process with two steps. First, we'll draw an _ random variable at the combined rate of $_$, giving the waiting time $T$ until the next event. Second, we'll flip a coin that will come out Heads with probability $_$. If we get Heads, the event is a birth and we increase the bacteria count $N$ by one ; otherwise the event is a death, and we decrease $N$ by one.

    5. We have one more wrinkle to address given that we now have deaths. What if we have one bacterium left and the next event is a death? At that point, how bacteria are there in the colony?
      Given that population size, what will be the overall rate of births?
      What will be the overall rate of deaths?
      . We refer to this event as an extinction event. Once an extinction occurs, we stop the simulation, as nothing else will occur.

      Can the phenomenon of extinction occur in the deterministic model? If the initial condition $N(0)$ is larger than zero, will the solution of the deterministic equation ever reach zero?
      These extinctions are a new feature of the stochastic model.

    6. To simulate the birth-death process, you can start with the incomplete birth-death process R script. The script needs the information you determined above: the overall event rate, the probability that an event is a birth, and the solution to the deterministic solution. See comments at beginning of file to instruction on how to edit the file to make it run. Fill in those values and also set the parameters such as lambda and mu to the values corresponding to this bacteria growth problem. Then, you can source the file to simulate the birth-death process and plot the results.

      Set tmax to about 40 and n_samples to about 10. With the original parameter values $\lambda=0.1$ and $\mu=0.05$, the deterministic model shows exponential
      . With the initial condition $N(0)=2$, do the stochastic model solutions always exhibit exponential _?
      Even though the growth rate is larger than the death rate, you will often observe
      , i.e., events where the population completely dies out (although with only 10 samples, you may not see them every simulation).

    7. Increase the initial condition to $N(0)=20$. Now do you observe extinctions?
      Increase the initial condition even larger to $N(0)=200$. As you increase the initial population size, what happens to the match between the deterministic solution and the stochastic solutions? The stochastic solutions
      the deterministic solution as population size increases.

    8. Set the initial condition back to $N(0)=20$. Now, increase both the birth rate and the death rate by the same amount, such as $\lambda=0.1+0.2 = 0.3$ and $\mu = 0.05+0.2=0.25$, or even as large as $\lambda=0.1+0.5 = 0.6$ and $\mu = 0.05+0.5=0.55$. With these identical increases to both the birth rate and the death rate, how does the deterministic model change?
      . On the other hand, what do you observe with the stochastic model as you increase the rates together? As the rates increase, the stochastic results
      . For the largest values of the rates, one can even begin to observe
      where the population occasionally dies out.