Prediction of COVID-19 spreading profiles in South Korea, Italy and Iran by data-driven coding

This work applies a data-driven coding method for prediction of the COVID-19 spreading profile in any given population that shows an initial phase of epidemic progression. Based on the historical data collected for COVID-19 spreading in 367 cities in China and the set of parameters of the augmented Susceptible-Exposed-Infected-Removed (SEIR) model obtained for each city, a set of profile codes representing a variety of transmission mechanisms and contact topologies is formed. By comparing the data of an early outbreak of a given population with the complete set of historical profiles, the best fit profiles are selected and the corresponding sets of profile codes are used for prediction of the future progression of the epidemic in that population. Application of the method to the data collected for South Korea, Italy and Iran shows that peaks of infection cases are expected to occur before mid April, the end of March and the end of May 2020, and that the percentage of population infected in each city or region will be less than 0.01%, 0.5% and 0.5%, for South Korea, Italy and Iran, respectively.

One point that can be better explained, to make it easier for the large audience of Plos One to understand the hypothesis of the authors, are the variables taken into account for the construction of the model. Indeed some social/political/climatic variables could differently affect viral spreading in different countries and/or cities. The effect of such factors should be at least discussed in the conclusions.
Authors' Response: Thank you for the positive support and encouragement. The point raised is indeed very legitimate for data-driven model. We have included a brief discussion at the end of the paper (Conclusion) to highlight this issue, and specifically, performance of any data-driven model would depend on the breath of coverage of the historical data. Thus, we do admit that the model needs to be continuously updated to cover different set of spreading characteristics. At present, we the data is limited, and the data-driven model may not perform satisfactorily if it is applied to a new epidemic which (i.e., model parameters being significantly different from those collected in the historical database) or in a population or country having a significantly different contact topology, travel patterns, effectiveness of government's control as well as climate.
Another important point to discuss is the difference between the predictions presented and the actual situation (see for example line 200-205). We do understand that the authors performed the analysis having the data till the 6th of March but they now have access to the observed cases in the different countries and regions and some of the observations importantly deviate from the model of spread here presented. The information on observed cases should be updated and discussed proposing criteria for the adjustment of the presented model.
Authors' Response: The manuscript was finished on March 8, 2020. Now, two months have passed. We have updated the results based on new datasets. All the figures are updated. It is found that the general profile pattern remains the same for the three countries under study, despite some adjustment on the actual number of infected cases predicted and the exact times of the peaks. We believe that the progression profile is basically being captured by the historical data of the 367 cities collected. However, deviation can still be expected due to outlier events such as superspreader events and irregular travel patterns that may cause deviation from the general patterns. For instance, we found significant deviation in the Hong Kong data compared with a similar prediction conducted earlier, and there was unexpected surge in the number of infected cases in mid March due to an unexpectedly large number of inbound travelers as a result of overseas students returning from UK and USA.
Regarding the updates in this revised version, we should mention that after March 22, Iran no longer publishes detailed data for individual provinces. Thus, we cannot achieve detailed data for individual provinces for forecasting after March 22. The main updates have been included in the blue texts in the Abstract, Section 4 and Conclusion of the revised paper.
In paragraph line 212-215, why do the authors conclude that the epidemic will end before June 2020, on the basis of which data?
Authors' Response: We provided further explanation in the revised paper as follows. Basically, from the progression trends of the epidemic these three countries, provided control measures continue to be in place, our model show that the number of confirmed cases of COVID-19 infection in most regions of these three countries will peak before the end of May 2020. Hence, the first wave of epidemic progression would come under control before the end of May 2020. This has been mentioned in Page 3 and Page 7 of the revised paper.

Minor points:
To speak about SARS-CoV-2 spreading would be more appropriated than COVID-19 spreading.
To indicate that for infected individuals the authors mean the people tested positive would also be appropriated, especially when presenting the data as proportion of population infected. Indeed, only seroprevalence studies may actually estimate the proportion of population that underwent infection.
Authors' Response: Thank you for pointing this out. In Section 2, as far as our study is concerned, we have clarified the kind of data we have, corresponding to the number of infected cases. See Section 2 on Page 3 of the revised paper.