Enlightenment on oscillatory properties of 23 class B notifiable infectious diseases in the mainland of China from 2004 to 2020

A variety of infectious diseases occur in mainland China every year. Cyclic oscillation is a widespread attribute of most viral human infections. Understanding the outbreak cycle of infectious diseases can be conducive for public health management and disease surveillance. In this study, we collected time-series data for 23 class B notifiable infectious diseases from 2004 to 2020 using public datasets from the National Health Commission of China. Oscillatory properties were explored using power spectrum analysis. We found that the 23 class B diseases from the dataset have obvious oscillatory patterns (seasonal or sporadic), which could be divided into three categories according to their oscillatory power in different frequencies each year. These diseases were found to have different preferred outbreak months and infection selectivity. Diseases that break out in autumn and winter are more selective. Furthermore, we calculated the oscillation power and the average number of infected cases of all 23 diseases in the first eight years (2004 to 2012) and the next eight years (2012 to 2020) since the update of the surveillance system. A strong positive correlation was found between the change of oscillation power and the change in the number of infected cases, which was consistent with the simulation results using a conceptual hybrid model. The establishment of reliable and effective analytical methods contributes to a better understanding of infectious diseases’ oscillation cycle characteristics. Our research has certain guiding significance for the effective prevention and control of class B infectious diseases.


Introduction
Infectious diseases are a type of disease caused by various pathogens, which can spread among people and animals [1][2][3][4]. Pathogens causing infectious diseases include viruses, rickettsia, mycoplasma, bacteria, fungi, parasites, etc [5,6]. The pathological process of infectious diseases depends on the nature of pathogenic microorganisms and the body's response thereto, as well as timely and appropriate treatment [7][8][9]. Most infectious diseases can be cured by strengthening the body's resistance to the appropriate pathogen and proper treatment [10,11]. If the body's immune resistance is poor and the infection is not treated on time, the infection may become chronic or spread, or may result in death [12,13]. Apart from the impact of infectious diseases on individuals, the outbreak of infectious diseases can be periodic. Recurrence is a common feature of infectious diseases [14] that has been confirmed in many countries around the world [15,16]. Examples of these are the seasonal pertussis patterns such as measles [17] in Europe [18][19][20][21][22], influenza [23] in Japan, and measles [23] and rabies [24][25][26] in China. This oscillation may be driven by natural factors, such as seasonal temperature, rainfall [27,28], natural disasters [29], or human factors, such as school terms [30,31], economic migration [32,33], or vaccination coverage [34]. It is essential to forecast the recurrent outbreaks of these infectious diseases due to their global reach, impact on individual livelihoods, as well as on the economy [35,36] and public mental health systems [37,38]. For example, in mainland China, in the period between January 1st to December 31 st , 2019, 10244507 cases of notifiable infectious diseases were reported in total, with 25285 resulting in death. Understanding cyclical outbreaks of seasonal or sporadic epidemics plays an important role in epidemic prevention and control [14,39].
To better assess and control epidemic outbreaks, the Chinese government strengthened the country's infectious disease surveillance system [40] after the outbreak of severe acute respiratory syndrome (SARS) in 2003. The infectious diseases in this system were divided into notifiable classes A, B, and C. Class A notifiable diseases like the plague and cholera can cause large-scale, severe epidemics within a short period of time. Class B notifiable diseases like AIDS and Anthrax may cause moderate epidemic outbreaks. Class C notifiable diseases like rubella and conjunctivitis are less severe and less infectious, causing mild outbreaks. The rate of infection of class A infectious diseases in China is very low, which suggests that it has been well controlled in China. Class B infectious diseases are not only highly infectious, but also have higher mortality than those of class C infectious diseases. Therefore, the study of infectious characteristics of class B infectious diseases is very important. Previous studies on infectious diseases that happened in China, however, are largely ignored. Of these few studies, investigations are mainly concentrated on one or a small number of infectious diseases, or only focus on a short time period. We are still far from having a concise method of analysis that can account for both the annual incidence patterns of infectious diseases in humans and the evolution of the diseases.
In this study, we first illustrated the time series of infected cases from 23 class B infectious diseases in mainland China from 2004 to 2020 and conducted a power spectrum analysis on this data. Based on different spectrums, we were able to categorize the diseases and subsequently investigate the preferred month of each infectious disease in a year. Further, we analyzed the correlation between the change in oscillation power of the infectious diseases and the rate of change of infected cases in the first eight years (2004 to 2012) and the last eight years (2012 to 2020) since the update of the surveillance system. Finally, we summarized the main findings as a table and established a conceptual model to illustrate the mechanism of the oscillatory characteristics.

Data and sources
Available time series data for the monthly reported and confirmed cases of 23

Spectrum analysis
To better quantify the oscillatory property of each infectious disease, we used the spectrum analysis. Similar methods have been used in classic and modern studies in the field of infectious diseases [6,15,17,19,41]. Spectrum analysis is a technique for decomposing complex signals into simpler signals based on the Fourier transform. Many biological signals can be expressed as the sum of various simple signals of different frequencies and produce information of a signal at different frequencies (such as amplitude, power, intensity, or phase, etc.).
The power spectral density (PSD) for each infectious disease during these 16 years was computed using the multi-taper method with five tapers using the Chronux toolbox [42] an open-source, data analysis toolbox (Chronux) available at http://chronux.org. Power spectra of the time series data of infected cases of each disease was calculated from 2004 to 2020. Essentially, the multi-taper method attempts to reduce the variance of spectral estimates by pre-multiplying the data with several orthogonal tapers known as Slepian functions. The frequency decomposition of multi-tapered data segments provides a set of independent spectral estimates that, once averaged, yield a more reliable ensemble estimate of noisy data.

Classification of different clusters of diseases
We noticed that we could distinguish different infectious diseases by the number of outbreaks in a year. To classify the different clusters of infectious diseases based on their oscillatory characteristics, we used two features: the power ratio between once a year and twice a year, and the power ratio between once a year and three times a year. The definition of power ratio is the ratio between the powers corresponding to two different frequencies (times per year). We then set two linear thresholds that precisely separated them into three clusters.

Tuning curves for monthly infected cases
We assumed that all infectious diseases included in this study have a similar trend each year in the 16 years of observation. Based on this assumption, we took the monthly average number of infected cases-during all 16 years and computed them into a tuning curve. Each infectious disease in this study has a tuning curve, and the oscillatory pattern within a year is clear.

Preferred month and selectivity of the epidemic outbreak
After getting the tuning curve of each disease, we aimed to better capture the property of oscillations for infectious diseases in a year. Two indices were defined: preferred month and infection selectivity. The preferred month index is defined as the month in a year that has the most cases of infections. The infection selectivity index is defined as 1 minus the ratio of the minimum and the maximum number of infected cases in a year. If the selectivity index is closer to 1, then the shape of the tuning curve is sharper, and vice versa.

Correlation analysis
We used the Spearman correlation to measure the relationship between the selectivity index and the preferred month index. The Pearson correlation was used in the correlation analysis between the change in infected cases and change in oscillation power of the infectious diseases on all 23 infectious diseases.

Conceptual hybrid model
We constructed a conceptual model to illustrate the underlying mechanism, i.e., the relationship between the change in infected cases and the change in oscillation power of infectious diseases.
The time series can be dissected into two components: trend component (TC) that can be modeled as a monotonically increasing function, and oscillatory component (OC) that can be modeled as a sine function. The multiplication of these two components constitutes the multiplication mechanism. The addition of these two components constitutes the additive mechanism. The hybrid mechanism combined addition and multiplication.
We then simulated the time series using this conceptual model by adding Gaussian noise (mean = 0, std = 1) to test the relationship between the change in oscillation power and the change in the number of infected cases. The TC was simplified as a linear function and OC was simplified as a trigonometric sine function.

Model fitting and evaluation
We further fitted the time series data of 23 infectious diseases using these three models respectively, which are shown in Eqs 1-3. The additive model is the summation of a trend component and an oscillatory component (Eq 1), the multiplication model is the multiplication of a trend component and an oscillatory component (Eq 2) and the hybrid model combines the two previous models (Eq 3).
Where A, k and t 0 represents the maximum infected cases, increasing rate and semi-saturation period of the trend component respectively, and B, f, φ represents the amplitude, frequency and initial phase of the oscillatory component, C is the baseline of the model. The goodness of fit for the above models is defined in Eq 4. All three models have the same number of parameters, so it is fair to compare the goodness of fit amongst them.
Where R data (t) and R model (t) represent the real and fitted data of the number of infected cases for a specific disease in time point t respectively, while n is the total number of the data points.

Results
Over the past 16 years, there are clear oscillatory patterns in infectious diseases' time series in mainland China (Fig 1A). The 16-year dataset makes the tuning curve of infected cases in different months visible (Fig 1B). We were able to estimate the power spectrum in the frequency band between 0 to 6 times per year (since the sampling rate of the data is 12 data points per year, with one data pint representing one month; Fig 1C).

Three clusters of the oscillatory patterns of the infectious diseases
It is clear that all 23 infectious diseases have had obvious patterns of oscillation in these 16 years (from 2004 to 2020). To better interpret the periodic properties throughout a year (i.e., whether the peak of the outbreak has seasonal preferences), we took the average of all 16 years' data (number of infected cases are represented as grey dots in Figs 1B and S2) to each month as a tuning curve (represented as black curves in Figs 1B and S2). Through the power spectrum analysis (Fig 1C), we found that all 23 infectious diseases have at least one clear oscillatory peak in their spectrum (S3 Fig). We then quantified the oscillatory characteristics of different diseases, and found three distinct clusters, which are illustrated in Fig 1D (similar to observations in Fig 1). The horizontal axis of this panel denotes the power ratio between once a year and twice a year, and the vertical axis denotes the power ratio between once a year and three times a year. The larger the value of the horizontal axis is, the stronger the oscillation is twice a year. The larger the value of the vertical axis is, the stronger the oscillation is three times a year. Then we set two thresholds that precisely separated them into three clusters (dashed line in Fig 1D). In total, 18 out of 23 diseases belong to Type I, two out of 23 diseases belong to Type II, the remaining three diseases belong to Type III (S4

Infectious diseases that break out in autumn and winter are more selective
Two indices (Preferred month and selectivity) were defined to capture the property of oscillations for each infectious disease in a year (Fig 2A). The preferred month is the month in a year with the most infected cases and the selectivity is the infection selectivity defined as 1 minus the ratio of minimum number and maximum number of infected cases in a year. The basic information related to the oscillatory properties helps us better understand the time and extent of their outbreak. Furthermore, we found a significant positive correlation between the selectivity index and the preferred month index (r = 0.49, p = 0.016, Spearman correlation) (Fig 2B). In China, spring season occurs between the months of March and May, summer is from June to August, autumn is from September to November, and winter is from December to February. Hence, this significant correlation means that the outbreak of the infectious diseases in autumn and winter have a higher selectivity, while outbreaks in spring or summer tend to have more infected cases throughout the year. This provides general guidance for the prevention of different types of infectious diseases.

Positive correlation between the change of infected cases and change of oscillatory power
We have shown the different seasonal oscillatory properties of 23 infectious diseases with static analysis. Next, we split the 16-year dataset into two parts: the first eight years (2004-2012) and the last eight years (2012-2020). In these 16 years, the number of infected cases of 14 out of 23 infectious diseases decreased over time (Fig 4A Left panel for a typical example), and nine out of 23 increased (Fig 4B Left panel for a typical example, Table 1, 5 th column). This information is summarized in the 5 th column of Table 1.
We then explored the relationship between the change in the number of infected cases and the corresponding strength of oscillatory power. To this end, we calculated the power

Hybrid model well explained the observed data
By comparing the first eight years (2004-2012) and the last eight years (2012-2020) of the available surveillance data, we can clearly see a trend in epidemic changes. The aggravation of an epidemic is not only illustrated in the increase of absolute value but also accompanied by stronger oscillation intensity. It is worth noting that this result is not inevitable since there are also other possible outcomes for time series data (Fig 4). It could also be possible that there is no correlation between the change in infected cases and the change in oscillation power of infectious diseases, which are shown as two forms: multiplication and addition mechanism. The time series can be dissected into two components: trend component (TC) that can be modeled as a linear function (Fig 4D red curve) and oscillatory component (OC) that can be modeled as sine function (Fig 4D blue curve). The multiplication of these two components constitutes the multiplication mechanism (Fig 4A). The mean infected cases remained unchanged, while the oscillatory strength increases (Fig 4A top) or decreases (Fig 4A bottom) as time goes on. The addition of these two components constitutes the multiplication mechanism ( Fig 4B). As time goes on, the oscillatory strength remained unchanged, while the mean number of infected cases increases (Fig 4B top) or decreases (Fig 4B bottom).
The hybrid mechanism combined the addition and multiplication of trend and oscillatory components ( Fig 4C). As time goes by, the trend of oscillatory strength and the mean number of infected cases increases (Fig 4C top) or decreases (Fig 4C bottom) together. The TC was then simplified as a linear function and OC was simplified as a trigonometric sine function ( Fig 4D). We then simulated this conceptual model by adding some noise to test the relationship between the change in oscillation power and the change in the number of infected cases, which is positively correlated. This relationship was consistent with the results of the analysis using real data (Fig 3C).
To further test the hybrid hypothesis, we fit the observed data using the three models (addition, multiplication, and hybrid) for each disease (Fig 5A). We found that the goodness of fit for the hybrid model is significantly larger than that of the other two models against the hybrid hypothesis (t test with Bonferroni correction, Fig 5B). Hence, based on the real analysis from mainland China, we can conclude that the data is in line with the hybrid hypothesis.

Discussion
Through systematic analysis of the oscillatory characteristics of 23 class B notifiable infectious diseases in mainland China from 2004 to 2020, three oscillation clusters (Figs 1 and 2), with different outbreak months, selectivity to specific month (Fig 3), and the change of oscillation strength with time evolution (Fig 4) were identified. The properties of each infectious disease are listed in Table 1.

Comparison with previous works
To our knowledge, this is the first work to investigate the oscillatory properties of such a large number of infectious diseases in mainland China. Most previous works have included a single or a few similar diseases in China [23][24][25][26][43][44][45] or countries around the world [17][18][19][20][21]. Although some studies contained more infectious diseases [46], they did not systematically investigate their oscillatory characteristics over time. We studied most of the infectious diseases (Class B) in mainland China from the perspective of the oscillation system and constructed a unified analytic framework to facilitate comparison. We also presented a method to categorize these diseases (Fig 2A). As illustrated with spectrum analysis, different diseases have different peak periods, different preferred outbreak months, and some have different selectivity. Diseases outbreak in autumn and winter are more selective, while those in summer and spring are less so. This finding will increase the understanding of the regularity of the diseases and guide in epidemic prevention. Importantly, some infectious diseases, like HBV [47], HCV [48], HEV [49,50], Anthrax [51], Gonorrhea [52,53], Treponema pallidum [54], and Leptospirosis [55,56] are thought to be more sporadic rather than seasonal or cyclical. In our work, we found that they have distinct oscillatory properties despite relatively lower selectivity compared with other seasonal diseases.

The trend of the epidemic situation in the mainland of China
In terms of the basic descriptive statistics of the 16-year time period investigated, the cases of infection of 14 out of 23 infectious diseases decreased over time, and nine out of 23 increased (Fig 4A and 4B). This shows that the control and preventative measures of the Chinese government have had a positive effect in these years. Moreover, we also found that the increase in oscillation strength often accompanies an increase in the number of infected cases, which will play an important role in the evaluation of epidemics in the future. Due to the typical cyclical fluctuation of the epidemic, the number of infected cases of one specific infectious disease in a month cannot reflect the real situation of the epidemic. If we take the hypothetical example that the average number of people infected with a certain disease over the past year is very large, this disease may even show a cyclical pattern every year, however, in one given month, only a few people are infected. Based on this, could we then assume that the epidemic has been effectively controlled and the peak has passed? The answer is no. As the results of our work, although this certain month is likely to be close to the peak of the epidemic cycle, the epidemic will rebound significantly in its preferred month, which then needs to be observed. People need to be more careful, and the government needs to strengthen its prevention and control during this period.

Mechanisms of the periodic outbreak of infectious diseases in mainland China
The oscillatory properties of infectious diseases may be influenced by natural [27,29] or human factors [30][31][32][33][34]. In our results, Type I infectious diseases with relatively high selectivity in one year can be assumed to be seasonal. Natural factors, such as rainfall, temperature, and humidity, may affect the host. Some of the Type I infectious diseases with relatively low selectivity, are sexually transmitted, such as gonorrhea and treponema pallidum. The spread of these diseases may not be driven by natural factors, but by human behavior. However, in our results, these seemingly sporadic infectious diseases also have a clear periodicity, and the mechanism is still unknown. Type II infectious diseases (e.g., hemorrhagic and scarlet fever) have outbreaks in both summer (June) and winter (December) (Fig 2C). The intrinsic mechanism remains unclear and calls for further exploration. Type II infectious diseases (AIDS, HBV, and HCV) have relatively low selectivity to certain months or seasons. The cause of their annual outbreak frequency (three times per year) needs further investigation.

Limitations of the current study
The primary limitation of this work is that we can only illustrate the properties under investigation descriptively. However, the oscillatory properties of infectious diseases reflect the dynamic relationship among humans, pathogens, and the global environment. A future study should investigate these characteristics in a more detailed manner, and subdivide the underlying oscillation properties of infectious diseases using mathematical models [22,41,57,58]. Our results may inspire future modeling work to further explore the mechanisms of the recurrent outbreaks of infectious diseases.