Posts Estimating COVID-19 Effective Reproduction Numbers
Post
Cancel

Estimating COVID-19 Effective Reproduction Numbers

The effective reproduction number of a virus, or R_t, is one of the most important measures of how quickly it’s spreading. Being able to estimate R_t is an important piece of the puzzle in policy decision making. If you go to a website like rt.live, you can see a real-time visualization of R_t in each state. But how are these numbers calculated?

First, we need to define a method for integrating continuous new data. In other words, how \(R_t\) is related to \(R_(t-1)\). For this, we can use Bayes’ rule:

This states that having seen k cases, the distribution of \(R_t\) is equal to the likelihood of seeing k new cases given \(R_t\) times the prior beliefs of the value of \(P(R_t)\) without the data divided by the general probability of seeing this many cases.

Next, we choose a likelihood function P(k|R_t), or how likely we are to see k new cases given a value of \(R_t\). Any time we need to model ‘arrivals’ over some time period of time, we tend to use the Poisson Distribution. Given an average arrival rate of \(\lambda\) new cases per day, the probability of seeing $k$ new cases is distributed according to the Poisson distribution:

To visualize:

1
2
3
4
5
6
7
8
9
10
11
# Column vector of k
k = np.arange(0, 70)[:, None]

# Different values of Lambda
lambdas = [10, 20, 30, 40]

# Evaluated the Probability Mass Function
y = sps.poisson.pmf(k, lambdas)

# Show the resulting shape
print(y.shape)
1
> (70, 4)
1
2
3
4
5
6
7
8
9
10
fig, ax = plt.subplots(figsize=(6,2.5))

ax.set(title='Poisson Distribution of Cases\n $p(k|\lambda)$')

plt.plot(k, y,
         marker='o',
         markersize=3,
         lw=0)

plt.legend(title="$\lambda$", labels=lambdas);

The Poisson distribution says that if you’re going to have \(\lambda\) cases per day, you’ll probably get that many plus or minus variation based on chance.

But we know there have been k cases and we need to know what value of \(\lambda\) is most likely. In order to do this, we fix k in place while varying \(\lambda\). This is the likelihood function.

Now we evaluate. Let’s imagine a sample of new case counts k. What is the likelihood of different values of \(R_t\) on each of those days?

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
k = np.array([20, 40, 55, 90])

# We create an array for every possible value of Rt
R_T_MAX = 12
r_t_range = np.linspace(0, R_T_MAX, R_T_MAX*100+1)

# Gamma is 1/serial interval
GAMMA = 1/7

# Map Rt into lambda so we can substitute it into the equation below
lam = k[:-1] * np.exp(GAMMA * (r_t_range[:, None] - 1))

# Evaluate the likelihood on each day and normalize sum of each day to 1.0
likelihood_r_t = sps.poisson.pmf(k[1:], lam)
likelihood_r_t /= np.sum(likelihood_r_t, axis=0)

# Plot it
ax = pd.DataFrame(
    data = likelihood_r_t,
    index = r_t_range
).plot(
    title='Likelihood of $R_t$ given $k$',
    xlim=(0,10),
    figsize=(6,2.5)
)

ax.legend(labels=k[1:], title='New Cases')
ax.set_xlabel('$R_t$');

Now we can perform the Bayesian update. To perform the Bayesian update, we need to multiply the likelihood by the prior to get the posteriors. We do that using the cumulative product of each successive day:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
posteriors = likelihood_r_t.cumprod(axis=1)
posteriors = posteriors / np.sum(posteriors, axis=0)

columns = pd.Index(range(1, posteriors.shape[1]+1), name='Day')
posteriors = pd.DataFrame(
    data = posteriors,
    index = r_t_range,
    columns = columns)

ax = posteriors.plot(
    title='Posterior $P(R_t|k)$',
    xlim=(0,10),
    figsize=(6,2.5)
)
ax.legend(title='Day')
ax.set_xlabel('$R_t$');

Now, we can calculate the most likely values of \(R_t\) on each day.

1
2
most_likely_values = posteriors.idxmax(axis=0)
most_likely_values
1
2
3
4
5
> Day
> 1    5.85
> 2    4.22
> 3    4.33
> dtype: float64

Credit to Kevin Systrom for the original methodology.

This post is licensed under CC BY 4.0 by the author.