Analysis and forecast with data assimilation

- a)
- Data Assimilation Research Team, RIKEN Center for Computational Science (R-CCS), Japan
- b)
- Graduate School of Mathematics, Nagoya University, Chikusa-ku, Nagoya 464-8602, Japan
- c)
- School of Science, Nagoya University, Chikusa-ku, Nagoya 464-8602, Japan
- d)
- Prediction Science Laboratory, RIKEN Cluster for Pioneering Research, Japan
- e)
- RIKEN interdisciplinary Theoretical and Mathematical Sciences Program (iTHEMS), Japan
- E-mails: takemasa.miyoshi@riken.jp, richard@math.nagoya-u.ac.jp,

chang.sun@a.riken.jp, qiwen.sun@riken.jp, zhly.ok@gmail.com

The last week of July 2021 saw a sharp rise in the number of cases of COVID-19 in Tokyo. We provide
an analysis of the propagation of the epidemic, from March 6^{th}, 2020, to the end of July 2021, and in
particular the effective reproduction number. Our approach is based on data assimilation, a technique
which has been proven very successful for example for accurate weather prediction. This analysis is then
correlated with the four states of emergency successively declared in Tokyo, and different scenarios are
proposed for the evolution of the epidemic in the coming weeks. Each scenario takes into account the
experience gained during the the first three states of emergency. The dependence on uncertain parameters
is also discussed.

In this note we provide a short analysis and several forecasts for the propagation of COVID-19 in Tokyo. These investigations are based on data assimilation, and the real data are collected up to the end of July 2021. Data assimilation is a fundamental method developed originally for numerical weather prediction, and recently applied successfully to different fields of research. Our data assimilation approach consists of two main steps, repeated on a regular basis: 1) A model for the evolution of the epidemic provides forecasts, 2) These forecasts are compared to real data, and an optimization process is implemented for the following forecasts. Since the data about COVID-19 are provided on a daily basis, the system is updated every day. Several independent systems are considered simultaneously, and consequently some quantities of interest can be estimated with probability distributions. Examples of such quantities are unknown parameters or unknown populations.

For the sake of completeness, we perform our analysis with two different approaches: one continuous approach using ensemble Kalman filter, and one agent-based approach with particle filter. Their main difference is that the continuous approach involves only 15 independent simulations of the propagation of the epidemic in Tokyo, while the discrete approach involves 50’000 to 100’000 such realizations. We introduce the common structure of these approaches in the next section, and refer to [1, 2] for precise descriptions and additional information.

The results of our investigations are represented by different scenarios mimicking the evolution of the epidemic during the first three states of emergency. Starting from the very beginning of August 2021, a behavior similar to the one observed during the first state of emergency would lead back to a reasonable situation within a few weeks, while for a behavior similar to the one observed during the third state of emergency, several months would be necessary. In all scenarios, the number of infected persons reaches unprecedented values.

Both approaches are based on an extension of the SEIR compartmental model commonly used in epidemiology. The compartment I, standing for infectious, is subdivided into two compartments, one with infectious agents who do not show any symptoms, and one with infectious agents presenting symptoms. The transfer diagram for this model is introduced in Figure 1.

In Figure 1, the different compartments correspond to:

- S
- The compartment of all susceptible agents,
- E
- The exposed agents, infected but not infectious yet,
- Ia
- The pre-symptomatic agents (who will develop symptoms later) or asymptomatic agents (who will never show any symptoms). The agents are infectious, part of them will become symptomatic and move to Is, while some will recover without being ever recorded (downwards arrow),
- Is
- The symptomatic agents, also infectious. The majority of these agents will look for medical assistance and move to H, while some will quarantine at home, without being recorded (downwards arrow),
- H
- The agents undergoing treatment (in hospital, in hotels, or at home), recorded by the medical systems,
- D
- The deceased agents,
- R
- The recovered agents who were recorded while in H.

In such a diagram, some constant (time duration, probability) are attached to each compartment and to each arrow. Some of these quantities are well documented in the literature, some of them are quite uncertain. In [1, 2] we provide the precise values used in our simulations and the references. It turns out that our model is stable under a change of the main uncertain parameters: the relative infectivity k of the asymptomatic agents compared to the symptomatic agents, and the ratio of symptomatic/asymptomatic agents. This stability is further discussed in Section 5.

The effective reproduction number Rt is defined by the average number of secondary cases generated by one primary case. It corresponds to one of the main indicators for the rapidity of propagation of a disease. If Rt is smaller than 1, then the propagation of the epidemic is slowing down, while if Rt is bigger than 1, the epidemic is expanding. Various techniques exist for evaluating the effective reproduction number, see for example [3], and our approach with data assimilation leads to a new and rather natural one. We provide in Figure 2 the evaluation of Rt (mean and confidence intervals) in the discrete and in the continuous approach.

In the graphs of Figure 2, the states of emergency (SoE) are indicated by the shaded regions. More precisely, the SoE took place on:

During the first three SoE, the mean effective reproduction number decreases, but the average decay of the mean depends on the period under consideration. If one evaluates this decay (taking an optimistic perspective, and accepting an effect lasting even after the official end of the SoE) one gets the average daily decay of Rt provided in Table 1 for the continuous approach. These values will be the basis for the predictions in the subsequent section. Note that our approach for estimating these values is very simple; a more precise and systematic technique is in preparation. Note also that the biggest average decay of Rt is observed during the first SoE, while the smallest one took place during the third state of emergency. In simpler term, the first SoE has been the most efficient one, while the third one has been the least efficient one.

State of emergency | Average daily decay of Rt (initial and final days considered) |

1 | 0.0563 (2020/4/8 - 2020/5/31) |

2 | 0.0147 (2021/1/15 - 2021/3/3) |

3 | 0.0085 (2021/4/29 - 2021/6/16) |

Let us mention that this model and these two approaches can also be used for estimating the populations E, Ia, or Is, as well as the population of recovered agents. Additional unknown parameters can also be estimated with these approaches, and as for Rt it turns out that the two approaches lead to similar analysis results.

For this section and for simplicity, we shall use only the continuous approach. For the forecasts, we shall mimic the
effects of the previous three SoE. Indeed, as mentioned in Table 1, each of them has produced a different decay of the
effective reproduction number Rt. Since the fourth SoE (declared on July 12^{th},2021) has not produced any
noticeable effect as of July 31^{st}, we have opted for two scenarios: (A) Impose a decay to Rt starting on August
1^{st}, (B) Impose a decay to Rt starting on August 4^{th}. From July 27^{th} to the initial day of the decay,
Rt has been kept increasing with the rate observed in the last week of July. For the daily decay rate
of Rt, we have used the three average daily decays measured during the previous SoE. For each of
them, the number of agents in H (in hospital, in hotels, or at home) and the accumulated deceased
agents D are shown below. As a mean of comparison, we also recall the observed values of H and D for
the last 17 months provided by [3]. At the technical level, let us mention that the death rate and the
average time in H during the forecasts have been kept constant at their current values of July 27^{th},
2021.

In Figure 3 we provide the information about the real data and our analysis results from March 6^{th},2020 to July
27^{th},2021. From July 27^{th} to August 3^{rd}, the effective reproduction number is kept increasing with the rate observed
for the few days before July 27^{th}, and from August 4^{th} a decay similar to the one observed during the first SoE is
imposed.

In Figures 4 and 5, we provide the outcomes of the different scenarios with a presentation starting on July
12^{th},2021. Scenarios A (decay from August 1^{st}) and B (decay from August 4^{th}) are presented separately. The decay
of the effective reproduction numbers observed during the first three SoE are also implemented independently, and
the numbering correspond: 1 for the first SoE, 2 for the second one, and 3 for the third one. Clearly, the scenario A1
(decay starting on August 1^{st} with a rate similar to the first SoE) is the most optimistic one, while
scenario B3 (decay starting on August 4^{th} with a rate similar to the third SoE) is the most pessimistic
one.

As mentioned in Section 2, the relative infectivity k of the asymptomatic agents compared to the symptomatic
agents is one very uncertain parameter. For most of our simulations we have used k = 0.58, based on the information
provided by the literature, see for example [4]. Since this ratio is quite uncertain, we performed similar investigations
with different ratios varying from 0.1 to 1.0. The corresponding mean values for Rt are shown in Figure 6. The
patterns are similar, especially after May 31^{st}, 2020. A conclusion from this result is obtained: except for the first
three months, the different values for the relative infectivity do not generate any noticeable difference for the
effective reproduction number.

Similarly, the ratio between symptomatic and asymptomatic agents is also a somewhat controversial parameter. For our simulations, we have used the respective proportion of 83% and 17%, see [4]. However, since asymptomatic cases are very difficult to detect, and since we can not be fully confident in this ratio, a sensitivity test has been performed. To do this, we have increased the ratio of asymptomatic, but kept the parameter k = 0.58 constant. Compared to the original setting, in the new scenarios more agents become asymptomatic and recover without showing any symptoms. On the other hand, the number of symptomatic agents is more or less constant, since they are compared to the real data on a daily basis. In Figure 7 the different curves for Rt look very similar. However, we observe that a bigger proportion of asymptomatic agents leads to a slightly bigger value of Rt. Indeed, since asymptomatic agents are less infectious by the factor 0.58, the transmission coefficient has to be bigger to create enough secondary cases to fit with the observations. As a consequence, Rt will also become slightly bigger.

The mean decay of the effective reproduction number Rt during the first state of emergency in Tokyo is much higher than the ones observed during the second and the third states of emergency. Psycho-socio-economical factors can certainly explain this evolution, but it is not our purpose to discuss this aspect. On the other hand, by implementing these decay rates in our model of the evolution of the epidemic in Tokyo, we forecast that the sudden rise in the number of COVID-19 infections of the last week of July 2021 can either be constrained within a few weeks (scenarios A1 and B1), or will last several months (scenarios A2, A3, B2, and B3). In any of these scenarios we expect that the number of agents undergoing treatment (compartment H) will reach a maximum value bigger than any values observed for the last 17 months in Tokyo, unless the current SoE becomes quickly more efficient than the first one.

[1] Memo1.pdf

[2] Memo2.pdf

[3] https://toyokeizai.net/sp/visual/tko/covid19/

[4] O. Byambasuren, et al., J. Association of Medical Microbiology and Infectious Disease Canada Vol. 5 no 4, Dec. 2020, 223–234.