Predicting the mortality of smoking attributable to cancer in Qingdao, China: A time-series analysis

Smoking is the leading preventable cause of death and disability from cancer in China. To provide a scientific basis for tobacco control strategies and measures, this study investigated cancer deaths attributed to smoking from 2005 to 2017 and predicted mortality trends from 2018 to 2020 in Qingdao. We used time series analysis to evaluate the number of deaths attributed to smoking among residents over 35 years old in Qingdao and predicted mortality trends. The number of cancer deaths attributed to smoking in Qingdao from 2005 to 2016 was between 170 and 407, showing an upward trend and a certain periodicity. The best model is the ARIMA (2,1,0)×(3,1,0)12, with the lowest BIC (6.640) and the highest stationary R2 (0.500). The predicted cancer deaths curve attributed to smoking in 2017 is consistent with the actual curve, with an average relative error of 5.74%. Applying this model to further predict the number of cancer deaths attributed to smoking in Qingdao from January 2018 to December 2020, the predicted results were 5,249, 5,423 and 6,048, respectively. The findings emphasized the need to further strengthen tobacco control measures to reduce the burden of disease caused by tobacco.


Introduction
China is the largest producer and consumer of tobacco in the world, with a current smoking rate of 27.7% (52.1% for men, 2.7% for women), and the number of current smokers reaching 316 million [1]. In 2014, China's tobacco consumption accounted for 44% of the world's total, surpassing the total of 29 countries with higher tobacco consumption such as Indonesia, Japan, Russia and the United States [2]. At present, 1.4 million people die of tobacco-related diseases every year in China [3]. This number is expected to exceed 3 million by 2050, which has brought huge economic and health losses to China [3,4]. The number of years of potential life lost (YPLL) caused by smoking in China was 7.67 million years in 2014, which means that each death caused by smoking was reduced by 15 years. Furthermore, smoking has gained a1111111111 a1111111111 a1111111111 a1111111111 a1111111111 attention not only because of its harmful effect on health, but also because it may place a huge burden on the economy, which may increase significantly over the coming decades. In the same year, the economic losses caused by smoking in China totaled about 350 billion CNY, accounting for 0.55% of the annual GDP [2].
Smoking is one of the main reasons for the rapid increase in the number of deaths from noncommunicable diseases in China. Diseases such as cardiovascular disease, cancer, respiratory diseases and diabetes have become the primary threat to the health of our residents. In 2012, the number of people that died from non-communicable diseases was about 8.6 million [5]. Among them, cancer was the most serious, and its mortality has surpassed cardiovascular disease since 2010 and has become the leading cause of death in the Chinese population [6]. Findings from a national study have shown that 32.7% and 5% of male and female cancer deaths in China are caused by smoking [7]. In addition, more than 160000 people die from cancer each year, and 20%-30% of deaths are caused by smoking [8]. And if the number of smokers continues to decrease, lung cancer mortality is expected to drop by 79% in the next 30 years [9]. This clearly shows that the tobacco threat poses a huge public health challenge.
An earlier study has shown that the disease burden caused by smoking in China has significantly increased in the past few decades [10]. In order to reduce smoking rates, cities in China have formulated relevant tobacco control measures. However, the quality of the implementation and enforcement of smoke-free laws varies among cities. Qingdao has enacted its smokefree law in 2013, which prohibited smoking in certain public places. Nevertheless, the current status of tobacco smoking in Qingdao remains serious. According to the 2014 survey results on tobacco smoking in Qingdao, the current smoking rate of residents over the age of 15 is 21.36%, with 40.5% for men. In 2015, there were 14,238 cancer deaths in Qingdao, of which 4,148 were due to smoking, accounting for 29.1% of the total cancer deaths [11]. The life expectancy loss of residents in Qingdao caused by cancer deaths due to smoking was 0.87 years, with 1.19 years for men and 0.45 years for women [12]. Hence, the cancer burden due to smoking in Qingdao cannot be ignored.
In this study, we aim to establish a time series model to evaluate the number of cancer deaths attributed to smoking in Qingdao from 2005 to 2016, and to project the trend of cancer deaths caused by smoking from 2017 to 2020 in Qingdao. We hypothesized that the current number of cancer deaths due to smoking will show an upward trend and continue to rise in the coming years. In addition, we would like to show that this research can fill the gaps in the literature on the trend of cancer deaths caused by smoking in Qingdao, and provide a basis for further development of tobacco control strategies and measures.

Data sources
Based on the 2013 Global Burden of Disease Study (GBD), this study included 12 tobaccorelated cancers, including esophageal cancer, gastric cancer, liver cancer, lung cancer, colorectal cancer, oral cancer, nasopharyngeal cancer, pancreatic cancer, kidney cancer, bladder cancer, leukemia, and cervical cancer. The relative risk (RR) of these 12 cancers also comes from the relevant studies of GBD 2013 [13]. The RRs for the 12 major smoking-related cancers and ICD-10 (International Classification of Diseases, 10th Revision) codes are shown in Table 1.
The cancer death data from 2005 to 2017 was derived from the death cause reporting system of the Qingdao Municipal Center for Disease Control and Prevention. The system covers the household registration population of 10 districts and cities under the jurisdiction of the Qingdao Municipality. The population composition information and data for 2005-2017 comes from the Qingdao Municipal Public Security Bureau. The scope of this analysis was limited to adults older than 30 years, as most cumulative hazardous effects of smoking-related cancers are unlikely to manifest in individuals younger than middle age. Stratification of the population by sex and age is shown in S1 Table. Based on previous research methods [14,15], we estimated the number of cancer deaths attributed to smoking in Qingdao from 2005 to 2017. Since the data on smoking rate in the past few years is not comprehensive, and the current smoking prevalence is not enough to evaluate the cumulative harm of smoking, we can use the smoking impact ratio (SIR) to estimate cancer mortality indirectly. SIR can reflect the cumulative harm of smoking, therefore, indirect estimation of smoking attributable mortality can reflect the impact of current and past smoking on disease mortality for cancer, COPD and other diseases. In the absence of direct information about smoking history, this method can directly estimate the cancer mortality caused by smoking from the obtained statistical data of disease mortality. The proportion of deaths caused by smoking is estimated using the standard population attributable fraction (PAF). SIR was used to measure cumulative hazards of smoking among the population in Qingdao and to calculate PAFs. The formula for PAF and the smoking-attributable mortality are as follows: where RR represents relative risk and M represents deaths related to cancers caused by smoking.

Method
The Autoregressive Integrated Moving Average ARIMA model is a well-known time series prediction method proposed by Box and Jenkins in the early 1970s, and also called the Box-Jenkins model method [16]. It comprehensively considers the trend change, periodic variation and random interference of the sequence, and fully extracts the available information in the data. It is one of the more accurate and universal time series analysis methods. The ARIMA model is a combination of multiple models, including the Autoregressive model, the Moving average model, and the Autoregressive Moving Average model. The form of the ARIMA

PLOS ONE
Predicting the mortality of smoking attributable to cancer in Qingdao, China model is represented by ARIMA (p, d, q), where p is the autoregressive order, d is the number of differences, and q is the moving average order. When the sequence has both short-term correlation and periodicity, the d-order trend difference is difficult to smooth the sequence, and the original sequence needs to be seasonally differentiated with period s to eliminate its periodicity. The seasonal difference is the difference between the observed value at a certain time in the sequence and the observed value at the same time as the previous period. Since there is a product relationship between the shortterm correlation and the seasonal effect, the sequence that satisfies this condition can be represented by the model ARIMA (p, d, q) × (P, D, Q) s, where p is the non-seasonal autoregressive order, d is a non-seasonal difference order, q is a non-seasonal moving average order, P is a seasonal autoregressive order, D is a seasonal difference order, Q is a seasonal moving average order, and s is a seasonal period length.
Four steps are required to make predictions using the ARIMA model. First, the application of the ARIMA model requires a sequence of stationary non-white noise sequences. If the sequence is a non-stationary sequence, natural logarithmic transformation, differential and seasonal difference processing are performed on the original data, and d and D are determined according to the number of differences. Second, p, q, P, and Q are initially determined based on the autocorrelation function (ACF) image and the partial autocorrelation function (PACF) image of the processed stationary time series. Third, the least squares estimation or maximum likelihood method is used to estimate the value of unknown parameters in the model. The significance of each parameter is tested, and the residual sequence is subjected to a white noise test. If the model is not suitable, we return to the model identification stage to re-select the model; if there are multiple effective models, we select according to the fitting goodness index which includes the coefficient of determination (R 2 ) and normalized Bayesian Information Criterion (BIC). Finally, the determined model is applied for analysis and prediction.

Model diagnosis
First, the characteristics of the autocorrelation coefficient and partial autocorrelation coefficient (Fig 4) within the 12th order of the lag period was observed. Based on auto-correlation coefficient tailing and partial auto-correlation coefficient second-order truncation, it is judged that p = 2, q = 0. Second, the characteristics of the autocorrelation coefficient and partial autocorrelation coefficient at the periodic node were then observed to determine the seasonal parameters P and Q. It is judged that the autocorrelation coefficient is tailed and partial autocorrelation coefficient is censored, that is, Q = 0, and the model is further estimated as ARIMA (2,1,0) × (P,1,0) 12 . In most cases, the order of P does not exceed 3, so the method P is used to determine the parameter P one by one from the low order to the high order. Therefore, the alternative models are: ARIMA (2, 1, 0) × (1, 1, 0) 12 , ARIMA (2, 1, 0) × (2, 1, 0) 12 , ARIMA (2, 1, 0) × (3,1,0) 12 . Table 2 shows the parameter estimation for plausible ARIMA models.

Forecast and analysis
The ARIMA (2,1,0) × (3,1,0) 12 model was used to fit the number of cancer deaths attributable to smoking in Qingdao from January 2005 to December 2016. Fig 5 shows that the predicted curve is consistent with the actual curve, and the actual curve is within the 95% confidence interval.
The model was used to predict the number of cancer deaths attributed to smoking in Qingdao in 2017, and the actual value is used to test the fit of the model. Table 3 shows that the 95%

PLOS ONE
Predicting the mortality of smoking attributable to cancer in Qingdao, China

PLOS ONE
Predicting the mortality of smoking attributable to cancer in Qingdao, China confidence interval of the predicted value includes the actual value, with the relative error range from 0.25% to 16.18%, and the average relative error is 5.74%, which proves that the model fits well.
This model was used to further predict the number of cancer deaths due to smoking in Qingdao from January 2018 to December 2020; the results are shown in Table 4. The number of cancer deaths attributed to smoking in Qingdao shows an upward trend in the next three years; the predicted results are 5,249, 5,423 and 6,048 respectively. In addition, we can conclude that the predicted annual growth rate of deaths is 7.34%, and the number of deaths caused by smoking shows an upward trend compared with the past 11 years.

Discussion
In an analysis of the data based on the number of cancer deaths attributed to smoking, we used ARIMA models to fit and predict the changing trends of smoking-attributable cancer deaths from 2005 to 2017. A prior study using data from 2005 to 2016 found that cancer deaths attributed to smoking showed a rising trend in Qingdao [14]. Our study added to the previous study by extending the analysis to 2020, and the results showed that the forecast results matched the actual data in 2017 well with an average relative error of 5.74%. According to the testing results, we found that the conducted model ARIMA (2,1,0) × (3,1,0) 12 was reliable with high validity and can be used to forecast the expected number of cases. Further prediction for

PLOS ONE
Predicting the mortality of smoking attributable to cancer in Qingdao, China

PLOS ONE
Predicting the mortality of smoking attributable to cancer in Qingdao, China the number of cancer deaths attributed to smoking from 2018 to 2020 showed predicted results are 5,249, 5,423 and 6,048 respectively. The model projected an increase in the number of cancer deaths attributed to smoking in Qingdao from 2018 to 2020. The model assumed that the current trend in smoking will persist for several years. Because of the higher smoking prevalence among residents aged 45-69 years [17], which is considered a high-risk group suffering from smoking-related disease [18], and the lag time between smoking and related cancers, the trend of deaths from cancer caused by smoking will continue to increase over time. This was consistent with the conclusion of another relevant study in China [10], which indicated that the smoking-related deaths will enter a high incidence period in 2010 to 2030 through five national smoking prevalence surveys in 1984, 1996, 2002, 2010 and 2015. Reducing the smoking rate can effectively reduce the number of deaths attributed to smoking in the future. A previous study found that 326,000 deaths due to noncommunicable diseases (e.g., cancer, cardiovascular diseases, diabetes mellitus and chronic respiratory diseases), including 222,400 cancer-related deaths among 30-70-year-old Chinese residents by 2030 could be avoided if there is a 30% reduction in smoking rate from the 2013 level [19]. Differences in socioeconomic status [20], disparities in educational level [20], changes in the prospects of tobacco products (e-cigarette and compound tobacco use) [21,22] will contribute to the rise in smoking prevalence. Several studies around the world indicated that effective tobacco control measures can help reduce smoking prevalence, which in turn would reduce the disease burden caused by smoking. Taiwan has adopted a series of tobacco control policies which show that it can reduce the burden of disease caused by smoking and second-hand smoke exposure effectively, and has experienced a decreased trend in disease burden from 1990 to 2013 [23]. A study in the United States which used models to predict the number of cancer deaths caused by smoking also showed a downward trend [9]. Furthermore, the American Cancer Society's 2018 Cancer Statistics Report indicated that tobacco use in the United States has decreased due to increased awareness of tobacco harm and the implementation of comprehensive tobacco control measures, and the mortality rates of lung cancer had fallen by 45% from 1990 to 2015 in men and 19% from 2002 to 2015 in women [24]. This information illustrates the importance of implementing effective tobacco control policies. The implementation of a comprehensive tobacco control policy in 27 European Union countries is the key to reducing smoking prevalence and increasing the smoking cession rates [25]. Qingdao has enacted the smoke-free law on 1 September 2013. This law prohibits smoking in restaurants, bars and hotels, and does not allow designated smoking rooms in these venues, which makes it more comprehensive than the laws implemented in other cities in China [26]. However, the enforcement of the smoke-free law is relatively poor; although it provides severe penalties for violations, no fines have been issued within the first year of law enforcement [27]. Moreover, the lag time between the implementation of tobacco control measures (smokers quitting or young people never starting to smoke) and the decrease in smokingrelated cancer mortality is exceptionally long because smoking-related cancers develop after approximately 20-30 years of continuous smoking; therefore, the number of cancer deaths attributed to smoking in Qingdao remain on an upward trend in the next 3 years. The government of Qingdao should strengthen the implementation of this regulation to create a smokefree environment. The Smoke-free law will help people reduce the tobacco use and protect them against the health risks of tobacco smoke. Surveys indicated that cigarette consumption in the United States is between 5% and 20% lower per capita in states with comprehensive smoke-free laws. Exposure to tobacco smoke is proven to cause heart disease, cancer and many other diseases. In many countries, it causes more than 10% of all tobacco-related deaths. The government of Qingdao should strengthen the implementation of this regulation to reduce the overall disease burden caused by smoking, as this would help not only to reduce the burden of cancer, but also other non-communicable diseases and some communicable diseases.
To the best of our knowledge, this is the first study to predict the number of cancer deaths attributed to smoking in Qingdao from 2017 to 2020 using the ARIMA model. These findings are valuable for the implementation and assessment of tobacco control measures and changes in the disease burden caused by smoking. However, the limitations of the study must be noted. First, we only analyzed the data of cancer deaths attributed to smoking from 2005 to 2016 in Qingdao. The results predicted were relatively accurate in the next 1-2 years. However, the 95% confidence interval was obviously wide in the third year, showing poor prediction. In order to ensure the accuracy of the prediction model, monitoring data needs should be added continually to the sequence over time. Second, our analysis only included the smoking-related cancer burden attributed to cigarette smoking, and it does not include cancer burden from former smokers and second-hand smokers, or the use of e-cigarettes and other tobacco types. Further research should consider the disease burden of former and passive smokers and various tobacco types. Finally, the findings may not be generalized to other cities in China. The enforcement of the smoke-free law in Qingdao is not as strict as in other cities, and different impacts may be observed in cities with strong tobacco control.

Conclusion
The number of cancer deaths attributed to smoking in Qingdao from 2005 to 2017 showed a fluctuating upward trend. The ARIMA (2,1,0) × (3,1,0) 12 model is relatively accurate in predicting the number of cancer deaths caused by smoking in Qingdao, and the prediction results from 2008 to 2020 are 5,249, 5,423 and 6,048, respectively. Strengthening tobacco control measures to reduce smoking rates and increase smoking cessation rates is of great significance for reducing the disease burden of malignant tumors in Qingdao.